refactor: move dry_run_role off public LLMClientConfig — closes #63#65
Conversation
There was a problem hiding this comment.
Code Review
This pull request enhances the --dry-run mode to guarantee no real LLM or network calls are made across all environment types, including those that bypass _make_llm_client (such as taubench, entropic, and openclaw), by introducing a private _DryRunLLMClientConfig subclass and a _StubAdapter. It also adds up-front configuration validation for taubench and comprehensive tests. The feedback suggests refining the exception handling in _StubAdapter.run_episode to catch specific exceptions like AttributeError and TypeError instead of using a broad except Exception: block.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
…review `except Exception:` swallowed any error from `agent_state.state_id().combined_hash`. For the dry-run stub the only legitimate failure modes are AttributeError (missing `state_id` / `combined_hash`) and TypeError (non-callable). Narrow the catch so real bugs propagate instead of being silently masked. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…hos#63 Replaces the public ``dry_run_role: str | None = None`` field on ``LLMClientConfig`` (introduced in PR aganthos#62) with a private subclass ``_DryRunLLMClientConfig`` defined in ``clawloop/cli.py``. Why a subclass instead of the field: - The public ``LLMClientConfig`` schema no longer carries any dry-run testing vocabulary. ``model_dump()`` / ``model_json_schema()`` of a user's TrainConfig do not emit ``dry_run_role: null`` anymore. New test ``test_public_llm_client_config_schema_excludes_dry_run_vocab`` locks this in. - ``model`` is still left untouched (the gain from PR aganthos#62 stands). - Pydantic ``model_copy()`` preserves the runtime class, so the role tag survives copies just like a field would. The regression test is updated to assert this end-to-end. Implementation note: the issue originally proposed a ``WeakKeyDictionary`` side-channel. Empirically, Pydantic v2 BaseModel instances are not hashable by default, so that approach is infeasible without making ``LLMClientConfig`` ``frozen=True`` (which would be a more invasive public-API change). The subclass approach satisfies the same spirit ("keep dry-run vocabulary out of the public schema") via a standard Pydantic mechanism. Verification: pre-commit clean; 57 tests pass; smoke runs for math and taubench dry-run still complete cleanly with no API keys set. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…review `except Exception:` swallowed any error from `agent_state.state_id().combined_hash`. For the dry-run stub the only legitimate failure modes are AttributeError (missing `state_id` / `combined_hash`) and TypeError (non-callable). Narrow the catch so real bugs propagate instead of being silently masked. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
50baf0c to
f5e7544
Compare
…ganthos#64 `train()` now accepts keyword-only `make_llm_client` and `env_builders` parameters; the CLI dry-run path builds them as closures over the user's config and passes them in directly. This removes the module-level mutation of `clawloop.train._make_llm_client` and `ENV_BUILDERS` that PR aganthos#62 introduced, and the `_DryRunLLMClientConfig` subclass that PR aganthos#65 added to identify roles for the monkey-patched factory. Changes ------- - `clawloop/train.py`: - `train()` signature gains `*, make_llm_client=None, env_builders=None`; defaults preserve current behavior. - All six `_build_*` helpers gain `make_llm_client` as a third positional parameter (uniform signature). Only `_build_math` actually uses it today; others accept it for the future refactor tracked by the out-of-scope notes in the plan. - New module-level alias `LLMClientFactory = Callable[[LLMClientConfig], Any]` for readable type hints. - `clawloop/cli.py`: - Delete `_DryRunLLMClientConfig` and the `LLMClientConfig` import that only existed to subclass it. - Replace `_install_dry_run_clients` (which mutated module state) with `_build_dry_run_overrides(config)` (pure: returns `(factory, builders)`). - `cmd_run` passes the overrides into `train(...)` directly. - `_StubAdapter` and `ENVS_USING_MAKE_LLM_CLIENT` are unchanged. - Tests: - `tests/test_train_config.py::TestTrainEndToEnd::test_harness_learning_math`: drop `with patch("clawloop.train._make_llm_client")` and pass `make_llm_client=_pick_client` to `train()` directly. - `tests/test_cli.py`: - Remove the subclass round-trip test (subclass is gone). - Replace with `test_dry_run_overrides_map_roles_correctly` and `test_dry_run_overrides_stub_bypassing_env` exercising `_build_dry_run_overrides` directly. - Add `test_dry_run_does_not_mutate_module_state` — invokes `cmd_run` with `dry_run=True` and asserts `clawloop.train._make_llm_client` and `ENV_BUILDERS` are unchanged after the call. Locks in the no-global-mutation contract. - Keep `test_public_llm_client_config_schema_excludes_dry_run_vocab`. - `tests/unit/test_train.py`: builder invocations now pass `None` as the third positional arg (openspiel doesn't use it). Verification ------------ - pre-commit clean (ruff + ruff-format) - pytest tests/test_cli.py tests/test_train_config.py tests/unit/test_train.py → 68 pass - Full suite: 1046 passed, 49 skipped, 0 failed - `clawloop run examples/configs/math_harness.json --dry-run` exits 0 - `clawloop run examples/configs/taubench_harness.json --dry-run` with all API key env vars unset exits 0 - `clawloop eval` still exits 2 with the redirect Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…) — closes #64 (#66) * refactor: replace monkey-patching with dependency injection — closes #64 `train()` now accepts keyword-only `make_llm_client` and `env_builders` parameters; the CLI dry-run path builds them as closures over the user's config and passes them in directly. This removes the module-level mutation of `clawloop.train._make_llm_client` and `ENV_BUILDERS` that PR #62 introduced, and the `_DryRunLLMClientConfig` subclass that PR #65 added to identify roles for the monkey-patched factory. Changes ------- - `clawloop/train.py`: - `train()` signature gains `*, make_llm_client=None, env_builders=None`; defaults preserve current behavior. - All six `_build_*` helpers gain `make_llm_client` as a third positional parameter (uniform signature). Only `_build_math` actually uses it today; others accept it for the future refactor tracked by the out-of-scope notes in the plan. - New module-level alias `LLMClientFactory = Callable[[LLMClientConfig], Any]` for readable type hints. - `clawloop/cli.py`: - Delete `_DryRunLLMClientConfig` and the `LLMClientConfig` import that only existed to subclass it. - Replace `_install_dry_run_clients` (which mutated module state) with `_build_dry_run_overrides(config)` (pure: returns `(factory, builders)`). - `cmd_run` passes the overrides into `train(...)` directly. - `_StubAdapter` and `ENVS_USING_MAKE_LLM_CLIENT` are unchanged. - Tests: - `tests/test_train_config.py::TestTrainEndToEnd::test_harness_learning_math`: drop `with patch("clawloop.train._make_llm_client")` and pass `make_llm_client=_pick_client` to `train()` directly. - `tests/test_cli.py`: - Remove the subclass round-trip test (subclass is gone). - Replace with `test_dry_run_overrides_map_roles_correctly` and `test_dry_run_overrides_stub_bypassing_env` exercising `_build_dry_run_overrides` directly. - Add `test_dry_run_does_not_mutate_module_state` — invokes `cmd_run` with `dry_run=True` and asserts `clawloop.train._make_llm_client` and `ENV_BUILDERS` are unchanged after the call. Locks in the no-global-mutation contract. - Keep `test_public_llm_client_config_schema_excludes_dry_run_vocab`. - `tests/unit/test_train.py`: builder invocations now pass `None` as the third positional arg (openspiel doesn't use it). Verification ------------ - pre-commit clean (ruff + ruff-format) - pytest tests/test_cli.py tests/test_train_config.py tests/unit/test_train.py → 68 pass - Full suite: 1046 passed, 49 skipped, 0 failed - `clawloop run examples/configs/math_harness.json --dry-run` exits 0 - `clawloop run examples/configs/taubench_harness.json --dry-run` with all API key env vars unset exits 0 - `clawloop eval` still exits 2 with the redirect Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: replace None with lambda dummies for builder make_llm_client arg Per gemini-code-assist review on PR #66: passing `None` for the `make_llm_client` parameter violates the declared type signature `LLMClientFactory = Callable[[LLMClientConfig], Any]`. The builders we invoke (openspiel, taubench stub) never call the factory, but the type contract still applies. Replace each `None` with `lambda _: None` — a callable that ignores its arg and returns None. Satisfies the type signature; behavior is unchanged because the callees don't invoke it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Closes #63. Replaces the public
dry_run_role: str | None = Nonefield onLLMClientConfig(introduced in PR #62) with a private subclass_DryRunLLMClientConfigdefined inclawloop/cli.py.Why a subclass instead of the field
LLMClientConfig.model_dump()andLLMClientConfig.model_json_schema()no longer carry any dry-run testing vocabulary. A new testtest_public_llm_client_config_schema_excludes_dry_run_vocablocks this in.modelfield still untouched. The original PR fix: address PR #60 review comments on clawloop run --dry-run + taubench #62 win (avoiding the marker-in-modelpattern from PR chore: reintroduceclawloop runas a TrainConfig wrapper #60) is preserved — downstream code like_build_entropicthat readstc.modelverbatim is unaffected.model_copy(). Pydantic v2 preserves the runtime class on copy, so the role tag travels with the data exactly like a field would. The regression test asserts this end-to-end (subclass identity,modelpreservation,dry_run_rolesurvival, and correct mock dispatch viaisinstance).Note on the issue's original proposal
Issue #63 proposed a
WeakKeyDictionary[LLMClientConfig, str]side-channel. Empirically (verified before implementing), Pydantic v2BaseModelinstances are not hashable by default —hash(cfg)raisesTypeError: unhashable type. Making them hashable would requiremodel_config = {\"frozen\": True}onLLMClientConfig, a more invasive public-API change that breaks any code that mutates a client config after construction.The subclass approach satisfies the same spirit ("keep dry-run vocabulary out of the public schema") via a standard Pydantic mechanism without that trade-off.
Note on PR layering
This PR builds on the work in #62 (still open at time of writing). Until #62 merges, the diff here against
mainwill include both PRs' commits. Once #62 merges, this becomes a clean ~80-line delta.Test plan
pytest tests/test_cli.py tests/test_train_config.py— 57 passes (was 56; new test for schema-cleanliness)pre-commit run --all-files— cleanclawloop run examples/configs/math_harness.json --dry-run— completes (real MathAdapter + mock LLMs via subclass detection)clawloop run examples/configs/taubench_harness.json --dry-runwith all API key env vars unset — completes (stub adapter)LLMClientConfig.model_json_schema()properties no longer includedry_run_roleLLMClientConfig(model=\"x\").model_dump()no longer includesdry_run_rolekey🤖 Generated with Claude Code