fix: address PR #60 review comments on clawloop run --dry-run + taubench#62
Conversation
…n` and taubench PR aganthos#60 review (gemini-code-assist) flagged four issues; this commit addresses all four with tests. 1. `_install_dry_run_clients` only patched `_make_llm_client`, so envs that bypass it (entropic, openclaw, taubench) could still make real API calls under `--dry-run`. The function now also swaps the env's builder in `ENV_BUILDERS` for a stub adapter whose `run_episode` / `run_batch` return canned episodes — no I/O, no LLM calls. Math keeps the real adapter path since `MathAdapter` routes through `_make_llm_client`. 2. Role identification used `id(cfg)`, which breaks if Pydantic copies or revalidates the `LLMClientConfig`. Replaced with a marker baked into `cfg.model`; the marker travels in field data so it survives `model_copy()`. New regression test covers a `model_copy()` round-trip. 3. `_build_taubench` return type widened from bare `tuple` to `tuple[Any, list[str]]` to match the rest of the builder family and enable static analysis. 4. `validate_config` now enforces taubench env_config: `env_config` required; `num_tasks` / `max_steps` / `max_concurrency` (if set) must be positive ints — `0` is no longer silently accepted via falsiness. Six new validation tests. Verification: 1036 + 49 = full suite passes (24s); manual smoke runs on math, taubench (no API keys set), and a bad-config rejection all behave as expected. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request improves the --dry-run functionality to guarantee no real LLM calls are made across different environment types (such as taubench, entropic, and openclaw) by introducing stub adapters and a model-field role marker that survives Pydantic copies. It also adds configuration validation for taubench and corresponding tests. The reviewer feedback suggests avoiding mutating the model field in LLMClientConfig to store the dry-run role as it is fragile, recommending a dedicated dry_run_role field instead. Additionally, the reviewer recommends tightening the positive integer validation for taubench configurations to explicitly reject booleans and floats.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Collapses a short `raise ValueError(...)` to a single line. Fix for the lint CI on PR aganthos#62; no behavior change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ct int check) Follow-up review on PR aganthos#62 raised two concerns; both are now fixed. 1. Storing the dry-run role by overloading `LLMClientConfig.model` could pollute downstream consumers that read the field as a real model name (e.g. `_build_entropic` propagates `tc.model` into `entropic_cfg`). Replaced the marker-in-model trick with a dedicated `dry_run_role: str | None = None` field on `LLMClientConfig`. The field travels with the data, so it survives `model_copy()` like the marker did, but leaves `model` untouched. Test now asserts both the field round-trip AND that `model` is unmodified after `_install_dry_run_clients`. 2. `_positive_int` in `validate_config` accepted `bool` and `float` silently because `int(True) == 1` and `int(3.5) == 3`. Added an explicit `isinstance(v, (bool, float))` reject before the `int(v)` coercion. Six new tests cover bool / float rejection on `num_tasks`, `max_steps`, and `max_concurrency`. Verification: pre-commit clean; 56 passes in test_cli.py + test_train_config.py. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Self-review follow-ups on the dry-run code added by the prior two commits. No behavior change for any existing test (56 still pass). - Move ENVS_USING_MAKE_LLM_CLIENT into clawloop/train.py next to ENV_BUILDERS, where the knowledge actually lives. Add a maintenance note: when registering a new builder, decide whether it routes through _make_llm_client and add the env_type if so; otherwise --dry-run falls back to _StubAdapter. Eliminates cross-module information leakage (cli.py asserting a property of train.py). - Collapse _make_stub_env_builder + _stub_episode into _StubAdapter methods. The factory was used in exactly one place; the helper had one caller. Two entities instead of three; depth unchanged. - Stub task count now tracks max(1, config.episodes_per_iter) instead of a magic 3, so --dry-run matches what the learning loop wants. - Drop default= args from validate_config._positive_int calls; TauBenchAdapter.setup already owns those defaults. Validation now only checks values the user supplied — removes duplicated knowledge and simplifies the helper signature. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…hos#63 Replaces the public ``dry_run_role: str | None = None`` field on ``LLMClientConfig`` (introduced in PR aganthos#62) with a private subclass ``_DryRunLLMClientConfig`` defined in ``clawloop/cli.py``. Why a subclass instead of the field: - The public ``LLMClientConfig`` schema no longer carries any dry-run testing vocabulary. ``model_dump()`` / ``model_json_schema()`` of a user's TrainConfig do not emit ``dry_run_role: null`` anymore. New test ``test_public_llm_client_config_schema_excludes_dry_run_vocab`` locks this in. - ``model`` is still left untouched (the gain from PR aganthos#62 stands). - Pydantic ``model_copy()`` preserves the runtime class, so the role tag survives copies just like a field would. The regression test is updated to assert this end-to-end. Implementation note: the issue originally proposed a ``WeakKeyDictionary`` side-channel. Empirically, Pydantic v2 BaseModel instances are not hashable by default, so that approach is infeasible without making ``LLMClientConfig`` ``frozen=True`` (which would be a more invasive public-API change). The subclass approach satisfies the same spirit ("keep dry-run vocabulary out of the public schema") via a standard Pydantic mechanism. Verification: pre-commit clean; 57 tests pass; smoke runs for math and taubench dry-run still complete cleanly with no API keys set. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
) * refactor: move dry_run_role off public LLMClientConfig — closes #63 Replaces the public ``dry_run_role: str | None = None`` field on ``LLMClientConfig`` (introduced in PR #62) with a private subclass ``_DryRunLLMClientConfig`` defined in ``clawloop/cli.py``. Why a subclass instead of the field: - The public ``LLMClientConfig`` schema no longer carries any dry-run testing vocabulary. ``model_dump()`` / ``model_json_schema()`` of a user's TrainConfig do not emit ``dry_run_role: null`` anymore. New test ``test_public_llm_client_config_schema_excludes_dry_run_vocab`` locks this in. - ``model`` is still left untouched (the gain from PR #62 stands). - Pydantic ``model_copy()`` preserves the runtime class, so the role tag survives copies just like a field would. The regression test is updated to assert this end-to-end. Implementation note: the issue originally proposed a ``WeakKeyDictionary`` side-channel. Empirically, Pydantic v2 BaseModel instances are not hashable by default, so that approach is infeasible without making ``LLMClientConfig`` ``frozen=True`` (which would be a more invasive public-API change). The subclass approach satisfies the same spirit ("keep dry-run vocabulary out of the public schema") via a standard Pydantic mechanism. Verification: pre-commit clean; 57 tests pass; smoke runs for math and taubench dry-run still complete cleanly with no API keys set. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: tighten exception in _StubAdapter.run_episode per PR #65 review `except Exception:` swallowed any error from `agent_state.state_id().combined_hash`. For the dry-run stub the only legitimate failure modes are AttributeError (missing `state_id` / `combined_hash`) and TypeError (non-callable). Narrow the catch so real bugs propagate instead of being silently masked. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…ganthos#64 `train()` now accepts keyword-only `make_llm_client` and `env_builders` parameters; the CLI dry-run path builds them as closures over the user's config and passes them in directly. This removes the module-level mutation of `clawloop.train._make_llm_client` and `ENV_BUILDERS` that PR aganthos#62 introduced, and the `_DryRunLLMClientConfig` subclass that PR aganthos#65 added to identify roles for the monkey-patched factory. Changes ------- - `clawloop/train.py`: - `train()` signature gains `*, make_llm_client=None, env_builders=None`; defaults preserve current behavior. - All six `_build_*` helpers gain `make_llm_client` as a third positional parameter (uniform signature). Only `_build_math` actually uses it today; others accept it for the future refactor tracked by the out-of-scope notes in the plan. - New module-level alias `LLMClientFactory = Callable[[LLMClientConfig], Any]` for readable type hints. - `clawloop/cli.py`: - Delete `_DryRunLLMClientConfig` and the `LLMClientConfig` import that only existed to subclass it. - Replace `_install_dry_run_clients` (which mutated module state) with `_build_dry_run_overrides(config)` (pure: returns `(factory, builders)`). - `cmd_run` passes the overrides into `train(...)` directly. - `_StubAdapter` and `ENVS_USING_MAKE_LLM_CLIENT` are unchanged. - Tests: - `tests/test_train_config.py::TestTrainEndToEnd::test_harness_learning_math`: drop `with patch("clawloop.train._make_llm_client")` and pass `make_llm_client=_pick_client` to `train()` directly. - `tests/test_cli.py`: - Remove the subclass round-trip test (subclass is gone). - Replace with `test_dry_run_overrides_map_roles_correctly` and `test_dry_run_overrides_stub_bypassing_env` exercising `_build_dry_run_overrides` directly. - Add `test_dry_run_does_not_mutate_module_state` — invokes `cmd_run` with `dry_run=True` and asserts `clawloop.train._make_llm_client` and `ENV_BUILDERS` are unchanged after the call. Locks in the no-global-mutation contract. - Keep `test_public_llm_client_config_schema_excludes_dry_run_vocab`. - `tests/unit/test_train.py`: builder invocations now pass `None` as the third positional arg (openspiel doesn't use it). Verification ------------ - pre-commit clean (ruff + ruff-format) - pytest tests/test_cli.py tests/test_train_config.py tests/unit/test_train.py → 68 pass - Full suite: 1046 passed, 49 skipped, 0 failed - `clawloop run examples/configs/math_harness.json --dry-run` exits 0 - `clawloop run examples/configs/taubench_harness.json --dry-run` with all API key env vars unset exits 0 - `clawloop eval` still exits 2 with the redirect Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…) — closes #64 (#66) * refactor: replace monkey-patching with dependency injection — closes #64 `train()` now accepts keyword-only `make_llm_client` and `env_builders` parameters; the CLI dry-run path builds them as closures over the user's config and passes them in directly. This removes the module-level mutation of `clawloop.train._make_llm_client` and `ENV_BUILDERS` that PR #62 introduced, and the `_DryRunLLMClientConfig` subclass that PR #65 added to identify roles for the monkey-patched factory. Changes ------- - `clawloop/train.py`: - `train()` signature gains `*, make_llm_client=None, env_builders=None`; defaults preserve current behavior. - All six `_build_*` helpers gain `make_llm_client` as a third positional parameter (uniform signature). Only `_build_math` actually uses it today; others accept it for the future refactor tracked by the out-of-scope notes in the plan. - New module-level alias `LLMClientFactory = Callable[[LLMClientConfig], Any]` for readable type hints. - `clawloop/cli.py`: - Delete `_DryRunLLMClientConfig` and the `LLMClientConfig` import that only existed to subclass it. - Replace `_install_dry_run_clients` (which mutated module state) with `_build_dry_run_overrides(config)` (pure: returns `(factory, builders)`). - `cmd_run` passes the overrides into `train(...)` directly. - `_StubAdapter` and `ENVS_USING_MAKE_LLM_CLIENT` are unchanged. - Tests: - `tests/test_train_config.py::TestTrainEndToEnd::test_harness_learning_math`: drop `with patch("clawloop.train._make_llm_client")` and pass `make_llm_client=_pick_client` to `train()` directly. - `tests/test_cli.py`: - Remove the subclass round-trip test (subclass is gone). - Replace with `test_dry_run_overrides_map_roles_correctly` and `test_dry_run_overrides_stub_bypassing_env` exercising `_build_dry_run_overrides` directly. - Add `test_dry_run_does_not_mutate_module_state` — invokes `cmd_run` with `dry_run=True` and asserts `clawloop.train._make_llm_client` and `ENV_BUILDERS` are unchanged after the call. Locks in the no-global-mutation contract. - Keep `test_public_llm_client_config_schema_excludes_dry_run_vocab`. - `tests/unit/test_train.py`: builder invocations now pass `None` as the third positional arg (openspiel doesn't use it). Verification ------------ - pre-commit clean (ruff + ruff-format) - pytest tests/test_cli.py tests/test_train_config.py tests/unit/test_train.py → 68 pass - Full suite: 1046 passed, 49 skipped, 0 failed - `clawloop run examples/configs/math_harness.json --dry-run` exits 0 - `clawloop run examples/configs/taubench_harness.json --dry-run` with all API key env vars unset exits 0 - `clawloop eval` still exits 2 with the redirect Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: replace None with lambda dummies for builder make_llm_client arg Per gemini-code-assist review on PR #66: passing `None` for the `make_llm_client` parameter violates the declared type signature `LLMClientFactory = Callable[[LLMClientConfig], Any]`. The builders we invoke (openspiel, taubench stub) never call the factory, but the type contract still applies. Replace each `None` with `lambda _: None` — a callable that ignores its arg and returns None. Satisfies the type signature; behavior is unchanged because the callees don't invoke it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Follow-up to PR #60. The gemini-code-assist review flagged four issues; this PR fixes all four with tests.
_install_dry_run_clientsonly patched_make_llm_client, so envs that bypass it (entropic, openclaw, taubench) could still hit real APIs under--dry-run.ENV_BUILDERSfor a no-I/O_StubAdapterwhoserun_episode/run_batchreturn canned episodes. Math keeps its real-adapter path sinceMathAdapterroutes through_make_llm_client.id(cfg), fragile to Pydantic copies.LLMClientConfig.model; survivesmodel_copy(). Regression test covers a copy round-trip._build_taubenchreturned baretuple.tuple[Any, list[str]].validate_confighad no taubench entry.env_configrequired;num_tasks/max_steps/max_concurrency, when set, must be positive ints (0is no longer silently accepted as falsy).Test plan
pytest tests/test_cli.py tests/test_train_config.py— 49 pass (7 new validation tests + 3 new bypass-env dry-run tests + 1 Pydantic-copy regression test)clawloop run examples/configs/math_harness.json --dry-run→ exit 0 (realMathAdapter+ mock LLMs)clawloop run examples/configs/taubench_harness.json --dry-runwith all API key env vars unset → exit 0 (stub adapter; no tau2 / litellm calls)num_tasks=0) → exit 1 with clearValueError: taubench env_config.num_tasks must be a positive int (got 0)instead of a deep failure🤖 Generated with Claude Code