Skip to content

[LiteLLM] _is_thinking_blocks_format drops Gemini thinking_blocks (only matches Anthropic 'signature' key) #5712

@ThibaultCoudertSephora

Description

@ThibaultCoudertSephora

Summary

google.adk.models.lite_llm._is_thinking_blocks_format, introduced in 1.28.0 via
fc45fa6 (PR closing #4801), gates Anthropic thinking_blocks parsing on the presence of a
per-block signature key:

# src/google/adk/models/lite_llm.py (main)
def _is_thinking_blocks_format(reasoning_value: Any) -> bool:
    """Returns True if reasoning_value is Anthropic thinking_blocks format."""
    if not isinstance(reasoning_value, list) or not reasoning_value:
        return False
    first = reasoning_value[0]
    return isinstance(first, dict) and "signature" in first

LiteLLM's Gemini integration also emits thinking_blocks when thinking is enabled on Gemini 2.5 / 3 models, but the per-block dicts do not carry a signature — the thought
signatures are returned at the response level under provider_specific_fields.thought_signatures as a parallel array. The detector therefore returns False, falls through to
_iter_reasoning_texts, which only matches dict keys ("text", "content", "reasoning", "reasoning_content") — Gemini blocks have "type" and "thinking", so nothing is
yielded
and the response surfaces zero thought Parts to the agent layer.

Net effect: a regression from <1.28.0 for any agent built on LiteLlm + a Gemini thinking model.

Affected versions

  • google-adk >= 1.28.0 (still present on main, 2026-05-15)

Environment

  • Python 3.12
  • google-adk 1.28.0+
  • litellm latest
  • Models reproduced on: gemini-3-flash-preview, gemini-2.5-pro (via LiteLLM proxy)

Actual LiteLLM response payload

Captured directly from LiteLLM with thought output enabled. Note choices[0].message.thinking_blocks shape and the separate response-level
provider_specific_fields.thought_signatures field:

{
  "model": "gemini-3-flash-preview",
  "choices": [{
    "finish_reason": "stop",
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "I am a large language model, trained by Google.",
      "reasoning_content": "**Understanding the User's Query and My Identity** ...",
      "thinking_blocks": [
        {
          "type": "thinking",
          "thinking": "**Understanding the User's Query and My Identity** ..."
        }
      ],
      "provider_specific_fields": {
        "thought_signatures": [
          "AY89a1/RGkcaRoJvGVOsj0pMpznJpT6OZESRZQF8ZYxB1+YHABJ+NjzLIb0fk8FOFQ..."
        ]
      }
    }
  }],
  "usage": {
    "completion_tokens": 73,
    "prompt_tokens": 5,
    "total_tokens": 78,
    "completion_tokens_details": {"reasoning_tokens": 62, "text_tokens": 11}
  }
}

Call trace through main

  1. _extract_reasoning_value(message) prefers thinking_blocks over reasoning_content — returns the Gemini list.
  2. _convert_reasoning_value_to_parts(reasoning_value) calls _is_thinking_blocks_format(...)False (no per-block signature).
  3. Falls back to _iter_reasoning_texts, which for each dict only yields under keys ("text", "content", "reasoning", "reasoning_content") — none present → yields nothing.
  4. Returned thought parts: []. The thought is lost.

Expected behavior

Gemini-shaped thinking_blocks should be recognized as a thinking-blocks payload and surfaced as Part(thought=True, text=...). The parallel signatures from
provider_specific_fields.thought_signatures should be attached to the corresponding thought parts so they can be relayed back to the model on subsequent turns.

Suggested fix

Normalize Gemini-shaped thinking_blocks into the Anthropic shape inside _extract_reasoning_value, by zipping the response-level thought_signatures onto each block. The
existing Anthropic codepath in _convert_reasoning_value_to_parts then handles both providers unchanged.

PR / unit tests below. Happy to open the PR if it looks right.

Related


PR diff

src/google/adk/models/lite_llm.py:

  @@ def _extract_reasoning_value(message: Message | Delta | None) -> Any:
     if message is None:
       return None
     # Anthropic models return thinking_blocks with type/thinking/signature fields.
     # This must be preserved to maintain thinking across tool call boundaries.
     thinking_blocks = message.get("thinking_blocks")
     if thinking_blocks is not None:
  +    # Gemini also emits thinking_blocks, but each block lacks a per-block
  +    # `signature`; signatures arrive in parallel under
  +    # `provider_specific_fields.thought_signatures`. Zip them in so the
  +    # downstream Anthropic codepath handles both providers uniformly.
  +    if (
  +        isinstance(thinking_blocks, list)
  +        and thinking_blocks
  +        and isinstance(thinking_blocks[0], dict)
  +        and "signature" not in thinking_blocks[0]
  +    ):
  +      provider_fields = message.get("provider_specific_fields") or {}
  +      signatures = provider_fields.get("thought_signatures") or []
  +      if signatures:
  +        merged: list[dict] = []
  +        for index, block in enumerate(thinking_blocks):
  +          if (
  +              isinstance(block, dict)
  +              and index < len(signatures)
  +              and signatures[index]
  +          ):
  +            merged.append({**block, "signature": signatures[index]})
  +          else:
  +            merged.append(block)
  +        thinking_blocks = merged
       return thinking_blocks
     reasoning_content = message.get("reasoning_content")
     if reasoning_content is not None:
       return reasoning_content
     return message.get("reasoning")

A note for maintainers (worth adding to the PR description, not the code): Anthropic per-block signature is treated as an opaque token and stored on Part.thought_signature via
signature.encode("utf-8"). Gemini signatures are base64-encoded bytes. If Part.thought_signature is expected to hold the decoded bytes (matching the outbound b64encode(...) path
in _extract_thought_signature_from_tool_call's counterpart), _convert_reasoning_value_to_parts should base64.b64decode(signature) when the source is Gemini. Left out of this PR to
keep the diff surgical — happy to address as a follow-up once you confirm the desired semantics.


Unit tests

Append to tests/unittests/models/test_litellm.py:

  def test_extract_reasoning_value_gemini_thinking_blocks_zips_signatures():
    """Gemini emits thinking_blocks without per-block signatures; signatures
    arrive in parallel under provider_specific_fields.thought_signatures.
    _extract_reasoning_value should normalize them into the Anthropic shape."""
    message = {
        "role": "assistant",
        "content": "I am a large language model.",
        "thinking_blocks": [
            {"type": "thinking", "thinking": "Step one ..."},
            {"type": "thinking", "thinking": "Step two ..."},
        ],
        "provider_specific_fields": {
            "thought_signatures": ["sig-1", "sig-2"],
        },
    }
    result = _extract_reasoning_value(message)
    assert result == [
        {"type": "thinking", "thinking": "Step one ...", "signature": "sig-1"},
        {"type": "thinking", "thinking": "Step two ...", "signature": "sig-2"},
    ]


  def test_extract_reasoning_value_gemini_thinking_blocks_without_signatures():
    """If provider_specific_fields is absent, Gemini thinking_blocks pass
    through unchanged. Downstream detector should still accept them once
    broadened — covered separately."""
    message = {
        "role": "assistant",
        "content": "Answer",
        "thinking_blocks": [
            {"type": "thinking", "thinking": "Inner monologue"},
        ],
    }
    result = _extract_reasoning_value(message)
    assert result == [{"type": "thinking", "thinking": "Inner monologue"}]


  def test_extract_reasoning_value_anthropic_thinking_blocks_unchanged():
    """Regression guard: Anthropic-shaped blocks (already carrying signature)
    must not be re-zipped or otherwise modified."""
    blocks = [
        {"type": "thinking", "thinking": "Anthropic thought", "signature": "abc"},
    ]
    message = {
        "role": "assistant",
        "content": "Answer",
        "thinking_blocks": blocks,
        "provider_specific_fields": {"thought_signatures": ["should-be-ignored"]},
    }
    result = _extract_reasoning_value(message)
    assert result == blocks
  

  def test_message_to_generate_content_response_gemini_thinking_blocks():
    """End-to-end: a Gemini-shaped message should surface a thought Part and
    the visible text Part, with the thought signature attached as bytes."""
    message = {
        "role": "assistant",
        "content": "I am a large language model.",
        "thinking_blocks": [
            {"type": "thinking", "thinking": "Identity check ..."},
        ],
        "provider_specific_fields": {
            "thought_signatures": ["AY89a1/RGkc"],
        },
    }
    response = _message_to_generate_content_response(message)
    assert len(response.content.parts) == 2
    thought_part = response.content.parts[0]
    text_part = response.content.parts[1]
    assert thought_part.thought is True
    assert thought_part.text == "Identity check ..."
    assert thought_part.thought_signature == b"AY89a1/RGkc"
    assert text_part.text == "I am a large language model."

Metadata

Metadata

Assignees

Labels

models[Component] Issues related to model supportrequest clarification[Status] The maintainer need clarification or more information from the author

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions