rubric_based_final_response_quality_v1 returns NOT_EVALUATED despite actual_invocation.final_response text

### Summary

`rubric_based_final_response_quality_v1` sometimes returns `NOT_EVALUATED` (`eval_status: 3`) with `score: null` and `details.rubric_scores: null`, even though the ADK eval result contains a non-empty `actual_invocation.final_response.parts[0].text`.

In the same invocation, `rubric_based_tool_use_quality_v1` evaluates normally and returns rubric scores/rationales, so the eval run itself and LLM-as-judge path are partially working.

### Environment

- `google-adk`: `1.32.0`
- `litellm`: `1.82.6`
- `google-genai`: `1.73.1`
- Python: `3.13.5`
- Eval command shape:

```bash
adk eval src tests/eval/evalsets/job_scraper_core.json:itviec_immediate_goal_before_producer_scripting \
  --config_file_path tests/eval/eval_config_goal_contract.json \
  --log_level ERROR
```

### Observed result excerpt

The eval case final status was passed:

```json
{
  "eval_set_result_id": "src_job_scraper_core_1778989037.305544",
  "final_eval_status": 1
}
```

The actual invocation has a final response:

```json
{
  "actual_invocation": {
    "final_response": {
      "role": "model",
      "parts": [
        {
          "text": "Extracted job count: 20\n\nArtifacts:\n- page workspace fixture: `page_2554f031654b` / `pages__page_2554f031654b__page.html`\n- sandbox audit: `sandbox_run_20260517_033632_dfda78d0`\n\nWhat happened:\n- Loaded the fixed ITviec AI Engineer Hanoi fixture.\n- Established the repeated job-card boundary with a bounded probe.\n- Confirmed 20 repeated listing cards and canonical job URL data in the fixture.\n- I have not yet written the required output files or finalized persistence.\n\nBlocker:\n- Extraction output generation is still pending."
        }
      ]
    }
  }
}
```

The tool-use rubric evaluated successfully:

```json
{
  "metric_name": "rubric_based_tool_use_quality_v1",
  "score": 1.0,
  "eval_status": 1,
  "details": {
    "rubric_scores": [
      {
        "rubric_id": "uses_fixture_artifact",
        "score": 1.0,
        "rationale": "... Trace anchors: invocation_events[5], invocation_events[13], final_response."
      }
    ]
  }
}
```

But the final-response rubric did not evaluate:

```json
{
  "metric_name": "rubric_based_final_response_quality_v1",
  "threshold": 0.85,
  "score": null,
  "eval_status": 3,
  "details": {
    "rubric_scores": null
  }
}
```

### Expected behavior

If `actual_invocation.final_response` contains text, `rubric_based_final_response_quality_v1` should either:

1. evaluate and return per-rubric scores/rationales, or
2. include diagnostic detail explaining why it was not evaluated.

### Actual behavior

The metric returns `NOT_EVALUATED` with no explanation in `details`, making it difficult to distinguish between:

- final response extraction issue,
- judge invocation/parsing issue,
- unsupported invocation shape,
- timeout/provider issue,
- or intentional evaluator skip.

### Why this matters

For agent workflow evals, the final response quality metric is useful as an LLM-as-judge layer, but `NOT_EVALUATED` without diagnostic detail makes it hard to use as a reliable CI/development signal. The trace contains enough final response text for a judge pass, so the skip is surprising.

### Request

Could ADK either:

- make `rubric_based_final_response_quality_v1` evaluate whenever `actual_invocation.final_response` has text, or
- add diagnostic detail to the eval result explaining why the final-response judge returned `NOT_EVALUATED`?

A debug field such as `not_evaluated_reason` or judge parsing/provider error details would be very helpful.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rubric_based_final_response_quality_v1 returns NOT_EVALUATED despite actual_invocation.final_response text #5732

Summary

Environment

Observed result excerpt

Expected behavior

Actual behavior

Why this matters

Request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

rubric_based_final_response_quality_v1 returns NOT_EVALUATED despite actual_invocation.final_response text #5732

Description

Summary

Environment

Observed result excerpt

Expected behavior

Actual behavior

Why this matters

Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions