fix: address PR #60 review comments on clawloop run --dry-run + taubench by dantp-ai · Pull Request #62 · aganthos/clawloop

dantp-ai · 2026-06-10T21:05:29Z

Summary

Follow-up to PR #60. The gemini-code-assist review flagged four issues; this PR fixes all four with tests.

Comment	Severity	Fix
1. `_install_dry_run_clients` only patched `_make_llm_client`, so envs that bypass it (entropic, openclaw, taubench) could still hit real APIs under `--dry-run`.	high	Also swap the env's builder in `ENV_BUILDERS` for a no-I/O `_StubAdapter` whose `run_episode` / `run_batch` return canned episodes. Math keeps its real-adapter path since `MathAdapter` routes through `_make_llm_client`.
2. Role lookup used `id(cfg)`, fragile to Pydantic copies.	medium	Marker baked into `LLMClientConfig.model`; survives `model_copy()`. Regression test covers a copy round-trip.
3. `_build_taubench` returned bare `tuple`.	medium	Now `tuple[Any, list[str]]`.
4. `validate_config` had no taubench entry.	medium	`env_config` required; `num_tasks` / `max_steps` / `max_concurrency`, when set, must be positive ints (`0` is no longer silently accepted as falsy).

Test plan

pytest tests/test_cli.py tests/test_train_config.py — 49 pass (7 new validation tests + 3 new bypass-env dry-run tests + 1 Pydantic-copy regression test)
Full suite: 1036 passed, 49 skipped, 0 failed
clawloop run examples/configs/math_harness.json --dry-run → exit 0 (real MathAdapter + mock LLMs)
clawloop run examples/configs/taubench_harness.json --dry-run with all API key env vars unset → exit 0 (stub adapter; no tau2 / litellm calls)
Bad config (num_tasks=0) → exit 1 with clear ValueError: taubench env_config.num_tasks must be a positive int (got 0) instead of a deep failure

🤖 Generated with Claude Code

…n` and taubench PR aganthos#60 review (gemini-code-assist) flagged four issues; this commit addresses all four with tests. 1. `_install_dry_run_clients` only patched `_make_llm_client`, so envs that bypass it (entropic, openclaw, taubench) could still make real API calls under `--dry-run`. The function now also swaps the env's builder in `ENV_BUILDERS` for a stub adapter whose `run_episode` / `run_batch` return canned episodes — no I/O, no LLM calls. Math keeps the real adapter path since `MathAdapter` routes through `_make_llm_client`. 2. Role identification used `id(cfg)`, which breaks if Pydantic copies or revalidates the `LLMClientConfig`. Replaced with a marker baked into `cfg.model`; the marker travels in field data so it survives `model_copy()`. New regression test covers a `model_copy()` round-trip. 3. `_build_taubench` return type widened from bare `tuple` to `tuple[Any, list[str]]` to match the rest of the builder family and enable static analysis. 4. `validate_config` now enforces taubench env_config: `env_config` required; `num_tasks` / `max_steps` / `max_concurrency` (if set) must be positive ints — `0` is no longer silently accepted via falsiness. Six new validation tests. Verification: 1036 + 49 = full suite passes (24s); manual smoke runs on math, taubench (no API keys set), and a bad-config rejection all behave as expected. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request improves the --dry-run functionality to guarantee no real LLM calls are made across different environment types (such as taubench, entropic, and openclaw) by introducing stub adapters and a model-field role marker that survives Pydantic copies. It also adds configuration validation for taubench and corresponding tests. The reviewer feedback suggests avoiding mutating the model field in LLMClientConfig to store the dry-run role as it is fragile, recommending a dedicated dry_run_role field instead. Additionally, the reviewer recommends tightening the positive integer validation for taubench configurations to explicitly reject booleans and floats.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Collapses a short `raise ValueError(...)` to a single line. Fix for the lint CI on PR aganthos#62; no behavior change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ct int check) Follow-up review on PR aganthos#62 raised two concerns; both are now fixed. 1. Storing the dry-run role by overloading `LLMClientConfig.model` could pollute downstream consumers that read the field as a real model name (e.g. `_build_entropic` propagates `tc.model` into `entropic_cfg`). Replaced the marker-in-model trick with a dedicated `dry_run_role: str | None = None` field on `LLMClientConfig`. The field travels with the data, so it survives `model_copy()` like the marker did, but leaves `model` untouched. Test now asserts both the field round-trip AND that `model` is unmodified after `_install_dry_run_clients`. 2. `_positive_int` in `validate_config` accepted `bool` and `float` silently because `int(True) == 1` and `int(3.5) == 3`. Added an explicit `isinstance(v, (bool, float))` reject before the `int(v)` coercion. Six new tests cover bool / float rejection on `num_tasks`, `max_steps`, and `max_concurrency`. Verification: pre-commit clean; 56 passes in test_cli.py + test_train_config.py. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Self-review follow-ups on the dry-run code added by the prior two commits. No behavior change for any existing test (56 still pass). - Move ENVS_USING_MAKE_LLM_CLIENT into clawloop/train.py next to ENV_BUILDERS, where the knowledge actually lives. Add a maintenance note: when registering a new builder, decide whether it routes through _make_llm_client and add the env_type if so; otherwise --dry-run falls back to _StubAdapter. Eliminates cross-module information leakage (cli.py asserting a property of train.py). - Collapse _make_stub_env_builder + _stub_episode into _StubAdapter methods. The factory was used in exactly one place; the helper had one caller. Two entities instead of three; depth unchanged. - Stub task count now tracks max(1, config.episodes_per_iter) instead of a magic 3, so --dry-run matches what the learning loop wants. - Drop default= args from validate_config._positive_int calls; TauBenchAdapter.setup already owns those defaults. Validation now only checks values the user supplied — removes duplicated knowledge and simplifies the helper signature. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…hos#63 Replaces the public ``dry_run_role: str | None = None`` field on ``LLMClientConfig`` (introduced in PR aganthos#62) with a private subclass ``_DryRunLLMClientConfig`` defined in ``clawloop/cli.py``. Why a subclass instead of the field: - The public ``LLMClientConfig`` schema no longer carries any dry-run testing vocabulary. ``model_dump()`` / ``model_json_schema()`` of a user's TrainConfig do not emit ``dry_run_role: null`` anymore. New test ``test_public_llm_client_config_schema_excludes_dry_run_vocab`` locks this in. - ``model`` is still left untouched (the gain from PR aganthos#62 stands). - Pydantic ``model_copy()`` preserves the runtime class, so the role tag survives copies just like a field would. The regression test is updated to assert this end-to-end. Implementation note: the issue originally proposed a ``WeakKeyDictionary`` side-channel. Empirically, Pydantic v2 BaseModel instances are not hashable by default, so that approach is infeasible without making ``LLMClientConfig`` ``frozen=True`` (which would be a more invasive public-API change). The subclass approach satisfies the same spirit ("keep dry-run vocabulary out of the public schema") via a standard Pydantic mechanism. Verification: pre-commit clean; 57 tests pass; smoke runs for math and taubench dry-run still complete cleanly with no API keys set. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

) * refactor: move dry_run_role off public LLMClientConfig — closes #63 Replaces the public ``dry_run_role: str | None = None`` field on ``LLMClientConfig`` (introduced in PR #62) with a private subclass ``_DryRunLLMClientConfig`` defined in ``clawloop/cli.py``. Why a subclass instead of the field: - The public ``LLMClientConfig`` schema no longer carries any dry-run testing vocabulary. ``model_dump()`` / ``model_json_schema()`` of a user's TrainConfig do not emit ``dry_run_role: null`` anymore. New test ``test_public_llm_client_config_schema_excludes_dry_run_vocab`` locks this in. - ``model`` is still left untouched (the gain from PR #62 stands). - Pydantic ``model_copy()`` preserves the runtime class, so the role tag survives copies just like a field would. The regression test is updated to assert this end-to-end. Implementation note: the issue originally proposed a ``WeakKeyDictionary`` side-channel. Empirically, Pydantic v2 BaseModel instances are not hashable by default, so that approach is infeasible without making ``LLMClientConfig`` ``frozen=True`` (which would be a more invasive public-API change). The subclass approach satisfies the same spirit ("keep dry-run vocabulary out of the public schema") via a standard Pydantic mechanism. Verification: pre-commit clean; 57 tests pass; smoke runs for math and taubench dry-run still complete cleanly with no API keys set. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: tighten exception in _StubAdapter.run_episode per PR #65 review `except Exception:` swallowed any error from `agent_state.state_id().combined_hash`. For the dry-run stub the only legitimate failure modes are AttributeError (missing `state_id` / `combined_hash`) and TypeError (non-callable). Narrow the catch so real bugs propagate instead of being silently masked. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…ganthos#64 `train()` now accepts keyword-only `make_llm_client` and `env_builders` parameters; the CLI dry-run path builds them as closures over the user's config and passes them in directly. This removes the module-level mutation of `clawloop.train._make_llm_client` and `ENV_BUILDERS` that PR aganthos#62 introduced, and the `_DryRunLLMClientConfig` subclass that PR aganthos#65 added to identify roles for the monkey-patched factory. Changes ------- - `clawloop/train.py`: - `train()` signature gains `*, make_llm_client=None, env_builders=None`; defaults preserve current behavior. - All six `_build_*` helpers gain `make_llm_client` as a third positional parameter (uniform signature). Only `_build_math` actually uses it today; others accept it for the future refactor tracked by the out-of-scope notes in the plan. - New module-level alias `LLMClientFactory = Callable[[LLMClientConfig], Any]` for readable type hints. - `clawloop/cli.py`: - Delete `_DryRunLLMClientConfig` and the `LLMClientConfig` import that only existed to subclass it. - Replace `_install_dry_run_clients` (which mutated module state) with `_build_dry_run_overrides(config)` (pure: returns `(factory, builders)`). - `cmd_run` passes the overrides into `train(...)` directly. - `_StubAdapter` and `ENVS_USING_MAKE_LLM_CLIENT` are unchanged. - Tests: - `tests/test_train_config.py::TestTrainEndToEnd::test_harness_learning_math`: drop `with patch("clawloop.train._make_llm_client")` and pass `make_llm_client=_pick_client` to `train()` directly. - `tests/test_cli.py`: - Remove the subclass round-trip test (subclass is gone). - Replace with `test_dry_run_overrides_map_roles_correctly` and `test_dry_run_overrides_stub_bypassing_env` exercising `_build_dry_run_overrides` directly. - Add `test_dry_run_does_not_mutate_module_state` — invokes `cmd_run` with `dry_run=True` and asserts `clawloop.train._make_llm_client` and `ENV_BUILDERS` are unchanged after the call. Locks in the no-global-mutation contract. - Keep `test_public_llm_client_config_schema_excludes_dry_run_vocab`. - `tests/unit/test_train.py`: builder invocations now pass `None` as the third positional arg (openspiel doesn't use it). Verification ------------ - pre-commit clean (ruff + ruff-format) - pytest tests/test_cli.py tests/test_train_config.py tests/unit/test_train.py → 68 pass - Full suite: 1046 passed, 49 skipped, 0 failed - `clawloop run examples/configs/math_harness.json --dry-run` exits 0 - `clawloop run examples/configs/taubench_harness.json --dry-run` with all API key env vars unset exits 0 - `clawloop eval` still exits 2 with the redirect Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…) — closes #64 (#66) * refactor: replace monkey-patching with dependency injection — closes #64 `train()` now accepts keyword-only `make_llm_client` and `env_builders` parameters; the CLI dry-run path builds them as closures over the user's config and passes them in directly. This removes the module-level mutation of `clawloop.train._make_llm_client` and `ENV_BUILDERS` that PR #62 introduced, and the `_DryRunLLMClientConfig` subclass that PR #65 added to identify roles for the monkey-patched factory. Changes ------- - `clawloop/train.py`: - `train()` signature gains `*, make_llm_client=None, env_builders=None`; defaults preserve current behavior. - All six `_build_*` helpers gain `make_llm_client` as a third positional parameter (uniform signature). Only `_build_math` actually uses it today; others accept it for the future refactor tracked by the out-of-scope notes in the plan. - New module-level alias `LLMClientFactory = Callable[[LLMClientConfig], Any]` for readable type hints. - `clawloop/cli.py`: - Delete `_DryRunLLMClientConfig` and the `LLMClientConfig` import that only existed to subclass it. - Replace `_install_dry_run_clients` (which mutated module state) with `_build_dry_run_overrides(config)` (pure: returns `(factory, builders)`). - `cmd_run` passes the overrides into `train(...)` directly. - `_StubAdapter` and `ENVS_USING_MAKE_LLM_CLIENT` are unchanged. - Tests: - `tests/test_train_config.py::TestTrainEndToEnd::test_harness_learning_math`: drop `with patch("clawloop.train._make_llm_client")` and pass `make_llm_client=_pick_client` to `train()` directly. - `tests/test_cli.py`: - Remove the subclass round-trip test (subclass is gone). - Replace with `test_dry_run_overrides_map_roles_correctly` and `test_dry_run_overrides_stub_bypassing_env` exercising `_build_dry_run_overrides` directly. - Add `test_dry_run_does_not_mutate_module_state` — invokes `cmd_run` with `dry_run=True` and asserts `clawloop.train._make_llm_client` and `ENV_BUILDERS` are unchanged after the call. Locks in the no-global-mutation contract. - Keep `test_public_llm_client_config_schema_excludes_dry_run_vocab`. - `tests/unit/test_train.py`: builder invocations now pass `None` as the third positional arg (openspiel doesn't use it). Verification ------------ - pre-commit clean (ruff + ruff-format) - pytest tests/test_cli.py tests/test_train_config.py tests/unit/test_train.py → 68 pass - Full suite: 1046 passed, 49 skipped, 0 failed - `clawloop run examples/configs/math_harness.json --dry-run` exits 0 - `clawloop run examples/configs/taubench_harness.json --dry-run` with all API key env vars unset exits 0 - `clawloop eval` still exits 2 with the redirect Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: replace None with lambda dummies for builder make_llm_client arg Per gemini-code-assist review on PR #66: passing `None` for the `make_llm_client` parameter violates the declared type signature `LLMClientFactory = Callable[[LLMClientConfig], Any]`. The builders we invoke (openspiel, taubench stub) never call the factory, but the type contract still applies. Replace each `None` with `lambda _: None` — a callable that ignores its arg and returns None. Satisfies the type signature; behavior is unchanged because the callees don't invoke it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

dantp-ai self-assigned this Jun 10, 2026

gemini-code-assist Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread clawloop/cli.py

Comment thread clawloop/train.py

dantp-ai and others added 2 commits June 10, 2026 23:08

style: ruff-format clawloop/train.py

b81403f

Collapses a short `raise ValueError(...)` to a single line. Fix for the lint CI on PR aganthos#62; no behavior change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

dantp-ai mentioned this pull request Jun 10, 2026

chore: reintroduce clawloop run as a TrainConfig wrapper #60

Merged

2 tasks

This was referenced Jun 10, 2026

Move dry_run_role off public LLMClientConfig schema #63

Closed

Replace monkey-patching of train._make_llm_client with dependency injection #64

Closed

dantp-ai requested a review from bordeauxred June 10, 2026 21:34

This was referenced Jun 10, 2026

refactor: move dry_run_role off public LLMClientConfig — closes #63 #65

Merged

refactor: replace monkey-patching with dependency injection in train() — closes #64 #66

Merged

dantp-ai merged commit 83227a5 into aganthos:main Jun 12, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: address PR #60 review comments on clawloop run --dry-run + taubench#62

fix: address PR #60 review comments on clawloop run --dry-run + taubench#62
dantp-ai merged 4 commits into
aganthos:mainfrom
dantp-ai:feat/pr60-followups

dantp-ai commented Jun 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dantp-ai commented Jun 10, 2026

Summary

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant