Skip to content

MCPToolset "Attempted to exit cancel scope in a different task" under to_a2a() multi-agent (A2A) β€” scales with concurrent McpToolset count; SSE transport is race-freeΒ #5729

@LiuYuWei

Description

@LiuYuWei

πŸ”΄ Required Information

Describe the Bug:
When an LlmAgent is served via to_a2a() and holds one or more McpToolsets using StreamableHTTPServerParams, MCP session creation during tool discovery intermittently fails with:

Failed to create MCP session: ... unhandled errors in a TaskGroup (1 sub-exception) whose inner exception is RuntimeError: Attempted to exit cancel scope in a different task than it was entered in.

ADK retries get_tools, but under load the agent proceeds with an incomplete tool list β€” the agent's own MCP tools silently disappear from the spec, the LLM then calls a now-missing tool and the A2A request hard-fails (ValueError: Tool '<name>' not found).

This is the same root cause as #4454, but on a different and very common surface: not adk web + Cloud Run, but to_a2a() ASGI agents in a hub-and-spoke multi-agent system (a manager dispatches to A2A sub-agents; each sub-agent owns its MCP server). We also add quantified new findings: the failure rate is ~linear in the number of concurrent McpToolset sessions per process, and the SSE transport does not exhibit the race at all when there is exactly one session.

Steps to Reproduce:

  1. Stand up an MCP server with Streamable HTTP transport.
  2. Build an LlmAgent whose tools include N McpToolset(connection_params=StreamableHTTPServerParams(url=...)) (N β‰₯ 1; the rate rises sharply with N β‰₯ 2).
  3. Serve it with to_a2a(agent) (uvicorn ASGI).
  4. Drive it with repeated A2A requests from a separate manager agent (each request triggers tool discovery / get_tools).
  5. Observe intermittent MCP-session-creation failures; with N β‰₯ 2 the agent regularly proceeds with a truncated tool list and the request fails with Tool '<own-tool>' not found.

Expected Behavior:
McpToolset consistently creates its MCP session and returns the full tool list per A2A request, regardless of how many McpToolsets the agent holds or how it is served.

Observed Behavior:
RuntimeError: Attempted to exit cancel scope in a different task than it was entered in, surfaced as unhandled errors in a TaskGroup (1 sub-exception), in a tight Retrying get_tools loop. Occurrence count per single A2A task, swept by concurrent session count (real production measurements, identical workload):

concurrent McpToolset sessions in the process race occurrences / task task outcome
3 (streamable-http) 84 survived only via ADK retry
2 (streamable-http) 40 HARD FAIL β€” own tools dropped, Tool 'run_python_code' not found
1 (SSE transport) 0 clean, reproduced 0 across many runs
1 (streamable-http) still races β€”

Environment Details:

  • ADK Library Version (pip show google-adk): 1.33.0
  • Desktop OS: Linux (Debian 13 / Docker container; host macOS)
  • Python Version (python -V): 3.12.13

Model Information:

  • Are you using LiteLLM: No
  • Which model is being used: gemini-2.5-flash (model-independent β€” this is a tool-discovery transport bug; it reproduces regardless of model)

🟑 Optional Information

Regression:
Per #4454, worked on ADK 1.14.0; broken from 1.24.x onward. We confirm it is still present on 1.33.0. Bisecting to ADK 1.31.1 did not help.

Logs:

INFO  mcp_session_manager: Retrying get_tools due to error: Failed to
      create MCP session: Failed to create MCP session: unhandled errors
      in a TaskGroup (1 sub-exception)
WARN  session_context: Error on session runner task: unhandled errors in
      a TaskGroup (1 sub-exception)
WARN  llm_agent: Failed to get tools from toolset McpToolset: Failed to
      create MCP session: ... (1 sub-exception)
ERROR a2a_agent_executor_impl: Error handling A2A request:
      Tool 'run_python_code' not found.
ValueError: Tool 'run_python_code' not found.

(The inner RuntimeError: Attempted to exit cancel scope in a different task is collapsed by the task-group exception-squashing; the message string matches #4454 exactly.)

Additional Context:

Minimal Reproduction Code:

from google.adk.agents import LlmAgent
from google.adk.a2a.utils.agent_to_a2a import to_a2a
from google.adk.tools.mcp_tool import McpToolset
from google.adk.tools.mcp_tool.mcp_session_manager import (
    StreamableHTTPServerParams,
)

agent = LlmAgent(
    name="leaf",
    model="gemini-2.5-flash",
    tools=[
        McpToolset(connection_params=StreamableHTTPServerParams(
            url="http://mcp-a:8082/mcp")),
        McpToolset(connection_params=StreamableHTTPServerParams(
            url="http://mcp-b:8081/mcp")),  # 2+ sessions -> high rate
    ],
    disallow_transfer_to_parent=True,
    disallow_transfer_to_peers=True,
)

a2a_app = to_a2a(agent)  # serve with uvicorn; drive with repeated A2A requests

How often has this issue occurred?:

  • Intermittently per attempt, but effectively Always under realistic multi-agent concurrency (84 occurrences in a single task at 3 concurrent sessions).

Metadata

Metadata

Assignees

Labels

tools[Component] This issue is related to tools

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions