Skip to content

fix(eval): handle unevaluated final response v2 results#5728

Open
pragnyanramtha wants to merge 1 commit into
google:mainfrom
pragnyanramtha:pragnyan/final-response-v2-no-eval-guard
Open

fix(eval): handle unevaluated final response v2 results#5728
pragnyanramtha wants to merge 1 commit into
google:mainfrom
pragnyanramtha:pragnyan/final-response-v2-no-eval-guard

Conversation

@pragnyanramtha
Copy link
Copy Markdown

Summary

Fixes a small aggregation edge case in FinalResponseMatchV2Evaluator: when every per-invocation result is skipped or not evaluated, the evaluator currently divides by zero while computing the overall score.

Root Cause

aggregate_invocation_results() filters out results whose score is None or whose eval_status is NOT_EVALUATED, but it unconditionally computes:

overall_score = num_valid / num_evaluated

If all judge samples fail to produce a usable score, num_evaluated remains 0 and evaluation crashes instead of returning a not-evaluated aggregate result. Other ADK evaluators handle this condition by returning overall_score=None and overall_eval_status=NOT_EVALUATED.

Change

  • Return an EvaluationResult with overall_score=None and overall_eval_status=NOT_EVALUATED when no FinalResponseMatchV2 invocation results are evaluable.
  • Add a focused regression test for all-skipped/all-not-evaluated invocation results.

Validation

uv sync --extra test
uv run pytest tests/unittests/evaluation/test_final_response_match_v2.py

Result: 18 passed, 20 warnings.

Full unit suite was not run; this patch is limited to FinalResponseMatchV2 aggregation and its targeted unit test file.

@pragnyanramtha pragnyanramtha marked this pull request as ready for review May 17, 2026 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant