Skip to content

fix: address PR #60 review comments on clawloop run --dry-run + taubench#62

Merged
dantp-ai merged 4 commits into
aganthos:mainfrom
dantp-ai:feat/pr60-followups
Jun 12, 2026
Merged

fix: address PR #60 review comments on clawloop run --dry-run + taubench#62
dantp-ai merged 4 commits into
aganthos:mainfrom
dantp-ai:feat/pr60-followups

Conversation

@dantp-ai

Copy link
Copy Markdown
Collaborator

Summary

Follow-up to PR #60. The gemini-code-assist review flagged four issues; this PR fixes all four with tests.

Comment Severity Fix
1. _install_dry_run_clients only patched _make_llm_client, so envs that bypass it (entropic, openclaw, taubench) could still hit real APIs under --dry-run. high Also swap the env's builder in ENV_BUILDERS for a no-I/O _StubAdapter whose run_episode / run_batch return canned episodes. Math keeps its real-adapter path since MathAdapter routes through _make_llm_client.
2. Role lookup used id(cfg), fragile to Pydantic copies. medium Marker baked into LLMClientConfig.model; survives model_copy(). Regression test covers a copy round-trip.
3. _build_taubench returned bare tuple. medium Now tuple[Any, list[str]].
4. validate_config had no taubench entry. medium env_config required; num_tasks / max_steps / max_concurrency, when set, must be positive ints (0 is no longer silently accepted as falsy).

Test plan

  • pytest tests/test_cli.py tests/test_train_config.py — 49 pass (7 new validation tests + 3 new bypass-env dry-run tests + 1 Pydantic-copy regression test)
  • Full suite: 1036 passed, 49 skipped, 0 failed
  • clawloop run examples/configs/math_harness.json --dry-run → exit 0 (real MathAdapter + mock LLMs)
  • clawloop run examples/configs/taubench_harness.json --dry-run with all API key env vars unset → exit 0 (stub adapter; no tau2 / litellm calls)
  • Bad config (num_tasks=0) → exit 1 with clear ValueError: taubench env_config.num_tasks must be a positive int (got 0) instead of a deep failure

🤖 Generated with Claude Code

…n` and taubench

PR aganthos#60 review (gemini-code-assist) flagged four issues; this commit
addresses all four with tests.

1. `_install_dry_run_clients` only patched `_make_llm_client`, so envs
   that bypass it (entropic, openclaw, taubench) could still make real
   API calls under `--dry-run`. The function now also swaps the env's
   builder in `ENV_BUILDERS` for a stub adapter whose `run_episode` /
   `run_batch` return canned episodes — no I/O, no LLM calls. Math
   keeps the real adapter path since `MathAdapter` routes through
   `_make_llm_client`.

2. Role identification used `id(cfg)`, which breaks if Pydantic copies
   or revalidates the `LLMClientConfig`. Replaced with a marker baked
   into `cfg.model`; the marker travels in field data so it survives
   `model_copy()`. New regression test covers a `model_copy()` round-trip.

3. `_build_taubench` return type widened from bare `tuple` to
   `tuple[Any, list[str]]` to match the rest of the builder family
   and enable static analysis.

4. `validate_config` now enforces taubench env_config: `env_config`
   required; `num_tasks` / `max_steps` / `max_concurrency` (if set)
   must be positive ints — `0` is no longer silently accepted via
   falsiness. Six new validation tests.

Verification: 1036 + 49 = full suite passes (24s); manual smoke runs
on math, taubench (no API keys set), and a bad-config rejection all
behave as expected.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@dantp-ai dantp-ai self-assigned this Jun 10, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the --dry-run functionality to guarantee no real LLM calls are made across different environment types (such as taubench, entropic, and openclaw) by introducing stub adapters and a model-field role marker that survives Pydantic copies. It also adds configuration validation for taubench and corresponding tests. The reviewer feedback suggests avoiding mutating the model field in LLMClientConfig to store the dry-run role as it is fragile, recommending a dedicated dry_run_role field instead. Additionally, the reviewer recommends tightening the positive integer validation for taubench configurations to explicitly reject booleans and floats.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread clawloop/cli.py
Comment thread clawloop/train.py
dantp-ai and others added 2 commits June 10, 2026 23:08
Collapses a short `raise ValueError(...)` to a single line.
Fix for the lint CI on PR aganthos#62; no behavior change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ct int check)

Follow-up review on PR aganthos#62 raised two concerns; both are now fixed.

1. Storing the dry-run role by overloading `LLMClientConfig.model` could
   pollute downstream consumers that read the field as a real model name
   (e.g. `_build_entropic` propagates `tc.model` into `entropic_cfg`).
   Replaced the marker-in-model trick with a dedicated `dry_run_role:
   str | None = None` field on `LLMClientConfig`. The field travels with
   the data, so it survives `model_copy()` like the marker did, but
   leaves `model` untouched. Test now asserts both the field round-trip
   AND that `model` is unmodified after `_install_dry_run_clients`.

2. `_positive_int` in `validate_config` accepted `bool` and `float`
   silently because `int(True) == 1` and `int(3.5) == 3`. Added an
   explicit `isinstance(v, (bool, float))` reject before the `int(v)`
   coercion. Six new tests cover bool / float rejection on `num_tasks`,
   `max_steps`, and `max_concurrency`.

Verification: pre-commit clean; 56 passes in test_cli.py + test_train_config.py.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Self-review follow-ups on the dry-run code added by the prior two
commits. No behavior change for any existing test (56 still pass).

- Move ENVS_USING_MAKE_LLM_CLIENT into clawloop/train.py next to
  ENV_BUILDERS, where the knowledge actually lives. Add a maintenance
  note: when registering a new builder, decide whether it routes
  through _make_llm_client and add the env_type if so; otherwise
  --dry-run falls back to _StubAdapter. Eliminates cross-module
  information leakage (cli.py asserting a property of train.py).
- Collapse _make_stub_env_builder + _stub_episode into _StubAdapter
  methods. The factory was used in exactly one place; the helper had
  one caller. Two entities instead of three; depth unchanged.
- Stub task count now tracks max(1, config.episodes_per_iter) instead
  of a magic 3, so --dry-run matches what the learning loop wants.
- Drop default= args from validate_config._positive_int calls;
  TauBenchAdapter.setup already owns those defaults. Validation now
  only checks values the user supplied — removes duplicated knowledge
  and simplifies the helper signature.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@dantp-ai dantp-ai requested a review from bordeauxred June 10, 2026 21:34
@dantp-ai dantp-ai merged commit 83227a5 into aganthos:main Jun 12, 2026
5 checks passed
dantp-ai added a commit to dantp-ai/clawloop that referenced this pull request Jun 12, 2026
…hos#63

Replaces the public ``dry_run_role: str | None = None`` field on
``LLMClientConfig`` (introduced in PR aganthos#62) with a private subclass
``_DryRunLLMClientConfig`` defined in ``clawloop/cli.py``.

Why a subclass instead of the field:
  - The public ``LLMClientConfig`` schema no longer carries any dry-run
    testing vocabulary. ``model_dump()`` / ``model_json_schema()`` of a
    user's TrainConfig do not emit ``dry_run_role: null`` anymore. New
    test ``test_public_llm_client_config_schema_excludes_dry_run_vocab``
    locks this in.
  - ``model`` is still left untouched (the gain from PR aganthos#62 stands).
  - Pydantic ``model_copy()`` preserves the runtime class, so the role
    tag survives copies just like a field would. The regression test
    is updated to assert this end-to-end.

Implementation note: the issue originally proposed a ``WeakKeyDictionary``
side-channel. Empirically, Pydantic v2 BaseModel instances are not
hashable by default, so that approach is infeasible without making
``LLMClientConfig`` ``frozen=True`` (which would be a more invasive
public-API change). The subclass approach satisfies the same spirit
("keep dry-run vocabulary out of the public schema") via a standard
Pydantic mechanism.

Verification: pre-commit clean; 57 tests pass; smoke runs for math and
taubench dry-run still complete cleanly with no API keys set.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
dantp-ai added a commit that referenced this pull request Jun 12, 2026
)

* refactor: move dry_run_role off public LLMClientConfig — closes #63

Replaces the public ``dry_run_role: str | None = None`` field on
``LLMClientConfig`` (introduced in PR #62) with a private subclass
``_DryRunLLMClientConfig`` defined in ``clawloop/cli.py``.

Why a subclass instead of the field:
  - The public ``LLMClientConfig`` schema no longer carries any dry-run
    testing vocabulary. ``model_dump()`` / ``model_json_schema()`` of a
    user's TrainConfig do not emit ``dry_run_role: null`` anymore. New
    test ``test_public_llm_client_config_schema_excludes_dry_run_vocab``
    locks this in.
  - ``model`` is still left untouched (the gain from PR #62 stands).
  - Pydantic ``model_copy()`` preserves the runtime class, so the role
    tag survives copies just like a field would. The regression test
    is updated to assert this end-to-end.

Implementation note: the issue originally proposed a ``WeakKeyDictionary``
side-channel. Empirically, Pydantic v2 BaseModel instances are not
hashable by default, so that approach is infeasible without making
``LLMClientConfig`` ``frozen=True`` (which would be a more invasive
public-API change). The subclass approach satisfies the same spirit
("keep dry-run vocabulary out of the public schema") via a standard
Pydantic mechanism.

Verification: pre-commit clean; 57 tests pass; smoke runs for math and
taubench dry-run still complete cleanly with no API keys set.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix: tighten exception in _StubAdapter.run_episode per PR #65 review

`except Exception:` swallowed any error from `agent_state.state_id().combined_hash`.
For the dry-run stub the only legitimate failure modes are AttributeError
(missing `state_id` / `combined_hash`) and TypeError (non-callable). Narrow
the catch so real bugs propagate instead of being silently masked.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
dantp-ai added a commit to dantp-ai/clawloop that referenced this pull request Jun 12, 2026
…ganthos#64

`train()` now accepts keyword-only `make_llm_client` and `env_builders`
parameters; the CLI dry-run path builds them as closures over the user's
config and passes them in directly. This removes the module-level mutation
of `clawloop.train._make_llm_client` and `ENV_BUILDERS` that PR aganthos#62
introduced, and the `_DryRunLLMClientConfig` subclass that PR aganthos#65 added
to identify roles for the monkey-patched factory.

Changes
-------
- `clawloop/train.py`:
  - `train()` signature gains `*, make_llm_client=None, env_builders=None`;
    defaults preserve current behavior.
  - All six `_build_*` helpers gain `make_llm_client` as a third
    positional parameter (uniform signature). Only `_build_math`
    actually uses it today; others accept it for the future refactor
    tracked by the out-of-scope notes in the plan.
  - New module-level alias `LLMClientFactory = Callable[[LLMClientConfig], Any]`
    for readable type hints.
- `clawloop/cli.py`:
  - Delete `_DryRunLLMClientConfig` and the `LLMClientConfig` import that
    only existed to subclass it.
  - Replace `_install_dry_run_clients` (which mutated module state) with
    `_build_dry_run_overrides(config)` (pure: returns `(factory, builders)`).
  - `cmd_run` passes the overrides into `train(...)` directly.
  - `_StubAdapter` and `ENVS_USING_MAKE_LLM_CLIENT` are unchanged.
- Tests:
  - `tests/test_train_config.py::TestTrainEndToEnd::test_harness_learning_math`:
    drop `with patch("clawloop.train._make_llm_client")` and pass
    `make_llm_client=_pick_client` to `train()` directly.
  - `tests/test_cli.py`:
    - Remove the subclass round-trip test (subclass is gone).
    - Replace with `test_dry_run_overrides_map_roles_correctly` and
      `test_dry_run_overrides_stub_bypassing_env` exercising
      `_build_dry_run_overrides` directly.
    - Add `test_dry_run_does_not_mutate_module_state` — invokes
      `cmd_run` with `dry_run=True` and asserts `clawloop.train._make_llm_client`
      and `ENV_BUILDERS` are unchanged after the call. Locks in the
      no-global-mutation contract.
    - Keep `test_public_llm_client_config_schema_excludes_dry_run_vocab`.
  - `tests/unit/test_train.py`: builder invocations now pass `None` as
    the third positional arg (openspiel doesn't use it).

Verification
------------
- pre-commit clean (ruff + ruff-format)
- pytest tests/test_cli.py tests/test_train_config.py tests/unit/test_train.py → 68 pass
- Full suite: 1046 passed, 49 skipped, 0 failed
- `clawloop run examples/configs/math_harness.json --dry-run` exits 0
- `clawloop run examples/configs/taubench_harness.json --dry-run` with
  all API key env vars unset exits 0
- `clawloop eval` still exits 2 with the redirect

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
dantp-ai added a commit that referenced this pull request Jun 12, 2026
…) — closes #64 (#66)

* refactor: replace monkey-patching with dependency injection — closes #64

`train()` now accepts keyword-only `make_llm_client` and `env_builders`
parameters; the CLI dry-run path builds them as closures over the user's
config and passes them in directly. This removes the module-level mutation
of `clawloop.train._make_llm_client` and `ENV_BUILDERS` that PR #62
introduced, and the `_DryRunLLMClientConfig` subclass that PR #65 added
to identify roles for the monkey-patched factory.

Changes
-------
- `clawloop/train.py`:
  - `train()` signature gains `*, make_llm_client=None, env_builders=None`;
    defaults preserve current behavior.
  - All six `_build_*` helpers gain `make_llm_client` as a third
    positional parameter (uniform signature). Only `_build_math`
    actually uses it today; others accept it for the future refactor
    tracked by the out-of-scope notes in the plan.
  - New module-level alias `LLMClientFactory = Callable[[LLMClientConfig], Any]`
    for readable type hints.
- `clawloop/cli.py`:
  - Delete `_DryRunLLMClientConfig` and the `LLMClientConfig` import that
    only existed to subclass it.
  - Replace `_install_dry_run_clients` (which mutated module state) with
    `_build_dry_run_overrides(config)` (pure: returns `(factory, builders)`).
  - `cmd_run` passes the overrides into `train(...)` directly.
  - `_StubAdapter` and `ENVS_USING_MAKE_LLM_CLIENT` are unchanged.
- Tests:
  - `tests/test_train_config.py::TestTrainEndToEnd::test_harness_learning_math`:
    drop `with patch("clawloop.train._make_llm_client")` and pass
    `make_llm_client=_pick_client` to `train()` directly.
  - `tests/test_cli.py`:
    - Remove the subclass round-trip test (subclass is gone).
    - Replace with `test_dry_run_overrides_map_roles_correctly` and
      `test_dry_run_overrides_stub_bypassing_env` exercising
      `_build_dry_run_overrides` directly.
    - Add `test_dry_run_does_not_mutate_module_state` — invokes
      `cmd_run` with `dry_run=True` and asserts `clawloop.train._make_llm_client`
      and `ENV_BUILDERS` are unchanged after the call. Locks in the
      no-global-mutation contract.
    - Keep `test_public_llm_client_config_schema_excludes_dry_run_vocab`.
  - `tests/unit/test_train.py`: builder invocations now pass `None` as
    the third positional arg (openspiel doesn't use it).

Verification
------------
- pre-commit clean (ruff + ruff-format)
- pytest tests/test_cli.py tests/test_train_config.py tests/unit/test_train.py → 68 pass
- Full suite: 1046 passed, 49 skipped, 0 failed
- `clawloop run examples/configs/math_harness.json --dry-run` exits 0
- `clawloop run examples/configs/taubench_harness.json --dry-run` with
  all API key env vars unset exits 0
- `clawloop eval` still exits 2 with the redirect

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix: replace None with lambda dummies for builder make_llm_client arg

Per gemini-code-assist review on PR #66: passing `None` for the
`make_llm_client` parameter violates the declared type signature
`LLMClientFactory = Callable[[LLMClientConfig], Any]`. The builders we
invoke (openspiel, taubench stub) never call the factory, but the type
contract still applies.

Replace each `None` with `lambda _: None` — a callable that ignores its
arg and returns None. Satisfies the type signature; behavior is
unchanged because the callees don't invoke it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant