π΄ Required Information
Describe the Bug:
When an LlmAgent is served via to_a2a() and holds one or more McpToolsets using StreamableHTTPServerParams, MCP session creation during tool discovery intermittently fails with:
Failed to create MCP session: ... unhandled errors in a TaskGroup (1 sub-exception) whose inner exception is RuntimeError: Attempted to exit cancel scope in a different task than it was entered in.
ADK retries get_tools, but under load the agent proceeds with an incomplete tool list β the agent's own MCP tools silently disappear from the spec, the LLM then calls a now-missing tool and the A2A request hard-fails (ValueError: Tool '<name>' not found).
This is the same root cause as #4454, but on a different and very common surface: not adk web + Cloud Run, but to_a2a() ASGI agents in a hub-and-spoke multi-agent system (a manager dispatches to A2A sub-agents; each sub-agent owns its MCP server). We also add quantified new findings: the failure rate is ~linear in the number of concurrent McpToolset sessions per process, and the SSE transport does not exhibit the race at all when there is exactly one session.
Steps to Reproduce:
- Stand up an MCP server with Streamable HTTP transport.
- Build an
LlmAgent whose tools include N McpToolset(connection_params=StreamableHTTPServerParams(url=...)) (N β₯ 1; the rate rises sharply with N β₯ 2).
- Serve it with
to_a2a(agent) (uvicorn ASGI).
- Drive it with repeated A2A requests from a separate manager agent (each request triggers tool discovery /
get_tools).
- Observe intermittent MCP-session-creation failures; with N β₯ 2 the agent regularly proceeds with a truncated tool list and the request fails with
Tool '<own-tool>' not found.
Expected Behavior:
McpToolset consistently creates its MCP session and returns the full tool list per A2A request, regardless of how many McpToolsets the agent holds or how it is served.
Observed Behavior:
RuntimeError: Attempted to exit cancel scope in a different task than it was entered in, surfaced as unhandled errors in a TaskGroup (1 sub-exception), in a tight Retrying get_tools loop. Occurrence count per single A2A task, swept by concurrent session count (real production measurements, identical workload):
| concurrent McpToolset sessions in the process |
race occurrences / task |
task outcome |
| 3 (streamable-http) |
84 |
survived only via ADK retry |
| 2 (streamable-http) |
40 |
HARD FAIL β own tools dropped, Tool 'run_python_code' not found |
| 1 (SSE transport) |
0 |
clean, reproduced 0 across many runs |
| 1 (streamable-http) |
still races |
β |
Environment Details:
- ADK Library Version (pip show google-adk): 1.33.0
- Desktop OS: Linux (Debian 13 / Docker container; host macOS)
- Python Version (python -V): 3.12.13
Model Information:
- Are you using LiteLLM: No
- Which model is being used: gemini-2.5-flash (model-independent β this is a tool-discovery transport bug; it reproduces regardless of model)
π‘ Optional Information
Regression:
Per #4454, worked on ADK 1.14.0; broken from 1.24.x onward. We confirm it is still present on 1.33.0. Bisecting to ADK 1.31.1 did not help.
Logs:
INFO mcp_session_manager: Retrying get_tools due to error: Failed to
create MCP session: Failed to create MCP session: unhandled errors
in a TaskGroup (1 sub-exception)
WARN session_context: Error on session runner task: unhandled errors in
a TaskGroup (1 sub-exception)
WARN llm_agent: Failed to get tools from toolset McpToolset: Failed to
create MCP session: ... (1 sub-exception)
ERROR a2a_agent_executor_impl: Error handling A2A request:
Tool 'run_python_code' not found.
ValueError: Tool 'run_python_code' not found.
(The inner RuntimeError: Attempted to exit cancel scope in a different task is collapsed by the task-group exception-squashing; the message string matches #4454 exactly.)
Additional Context:
Minimal Reproduction Code:
from google.adk.agents import LlmAgent
from google.adk.a2a.utils.agent_to_a2a import to_a2a
from google.adk.tools.mcp_tool import McpToolset
from google.adk.tools.mcp_tool.mcp_session_manager import (
StreamableHTTPServerParams,
)
agent = LlmAgent(
name="leaf",
model="gemini-2.5-flash",
tools=[
McpToolset(connection_params=StreamableHTTPServerParams(
url="http://mcp-a:8082/mcp")),
McpToolset(connection_params=StreamableHTTPServerParams(
url="http://mcp-b:8081/mcp")), # 2+ sessions -> high rate
],
disallow_transfer_to_parent=True,
disallow_transfer_to_peers=True,
)
a2a_app = to_a2a(agent) # serve with uvicorn; drive with repeated A2A requests
How often has this issue occurred?:
- Intermittently per attempt, but effectively Always under realistic multi-agent concurrency (84 occurrences in a single task at 3 concurrent sessions).
π΄ Required Information
Describe the Bug:
When an
LlmAgentis served viato_a2a()and holds one or moreMcpToolsets usingStreamableHTTPServerParams, MCP session creation during tool discovery intermittently fails with:Failed to create MCP session: ... unhandled errors in a TaskGroup (1 sub-exception)whose inner exception isRuntimeError: Attempted to exit cancel scope in a different task than it was entered in.ADK retries
get_tools, but under load the agent proceeds with an incomplete tool list β the agent's own MCP tools silently disappear from the spec, the LLM then calls a now-missing tool and the A2A request hard-fails (ValueError: Tool '<name>' not found).This is the same root cause as #4454, but on a different and very common surface: not
adk web+ Cloud Run, butto_a2a()ASGI agents in a hub-and-spoke multi-agent system (a manager dispatches to A2A sub-agents; each sub-agent owns its MCP server). We also add quantified new findings: the failure rate is ~linear in the number of concurrentMcpToolsetsessions per process, and the SSE transport does not exhibit the race at all when there is exactly one session.Steps to Reproduce:
LlmAgentwhosetoolsinclude NMcpToolset(connection_params=StreamableHTTPServerParams(url=...))(N β₯ 1; the rate rises sharply with N β₯ 2).to_a2a(agent)(uvicorn ASGI).get_tools).Tool '<own-tool>' not found.Expected Behavior:
McpToolsetconsistently creates its MCP session and returns the full tool list per A2A request, regardless of how manyMcpToolsets the agent holds or how it is served.Observed Behavior:
RuntimeError: Attempted to exit cancel scope in a different task than it was entered in, surfaced asunhandled errors in a TaskGroup (1 sub-exception), in a tightRetrying get_toolsloop. Occurrence count per single A2A task, swept by concurrent session count (real production measurements, identical workload):Tool 'run_python_code' not foundEnvironment Details:
Model Information:
π‘ Optional Information
Regression:
Per #4454, worked on ADK 1.14.0; broken from 1.24.x onward. We confirm it is still present on 1.33.0. Bisecting to ADK 1.31.1 did not help.
Logs:
(The inner
RuntimeError: Attempted to exit cancel scope in a different taskis collapsed by the task-group exception-squashing; the message string matches #4454 exactly.)Additional Context:
force_new_version, PR fix(mcp): add use_isolated_event_loop to McpToolset for Vertex AI Agent Engine compatibility Β #5509 (use_isolated_event_loop, would not apply cleanly to 1.33), ADK 1.31.1.McpToolsetto the SSE transport (SseConnectionParams) and keep exactly oneMcpToolsetper agent process (move any shared/cross-agent tools to a plain blocking HTTP client instead of a 2ndMcpToolset). With that, race occurrences went 84 β 0 and stayed 0 across many production runs.streamablehttp_clienttask group entered/exited across asyncio tasks). Cross-reference: ADK 1.24.x MCP StreamableHTTP cancel scope error with adk web + Cloud RunΒ #4454, PR fix(mcp): add use_isolated_event_loop to McpToolset for Vertex AI Agent Engine compatibility Β #5509.Minimal Reproduction Code:
How often has this issue occurred?: