Skip to content

refactor: move dry_run_role off public LLMClientConfig — closes #63#65

Merged
dantp-ai merged 2 commits into
aganthos:mainfrom
dantp-ai:feat/issue-63-dry-run-role-subclass
Jun 12, 2026
Merged

refactor: move dry_run_role off public LLMClientConfig — closes #63#65
dantp-ai merged 2 commits into
aganthos:mainfrom
dantp-ai:feat/issue-63-dry-run-role-subclass

Conversation

@dantp-ai

Copy link
Copy Markdown
Collaborator

Summary

Closes #63. Replaces the public dry_run_role: str | None = None field on LLMClientConfig (introduced in PR #62) with a private subclass _DryRunLLMClientConfig defined in clawloop/cli.py.

Why a subclass instead of the field

  • Public schema stays clean. LLMClientConfig.model_dump() and LLMClientConfig.model_json_schema() no longer carry any dry-run testing vocabulary. A new test test_public_llm_client_config_schema_excludes_dry_run_vocab locks this in.
  • model field still untouched. The original PR fix: address PR #60 review comments on clawloop run --dry-run + taubench #62 win (avoiding the marker-in-model pattern from PR chore: reintroduce clawloop run as a TrainConfig wrapper #60) is preserved — downstream code like _build_entropic that reads tc.model verbatim is unaffected.
  • Survives model_copy(). Pydantic v2 preserves the runtime class on copy, so the role tag travels with the data exactly like a field would. The regression test asserts this end-to-end (subclass identity, model preservation, dry_run_role survival, and correct mock dispatch via isinstance).

Note on the issue's original proposal

Issue #63 proposed a WeakKeyDictionary[LLMClientConfig, str] side-channel. Empirically (verified before implementing), Pydantic v2 BaseModel instances are not hashable by defaulthash(cfg) raises TypeError: unhashable type. Making them hashable would require model_config = {\"frozen\": True} on LLMClientConfig, a more invasive public-API change that breaks any code that mutates a client config after construction.

The subclass approach satisfies the same spirit ("keep dry-run vocabulary out of the public schema") via a standard Pydantic mechanism without that trade-off.

Note on PR layering

This PR builds on the work in #62 (still open at time of writing). Until #62 merges, the diff here against main will include both PRs' commits. Once #62 merges, this becomes a clean ~80-line delta.

Test plan

  • pytest tests/test_cli.py tests/test_train_config.py — 57 passes (was 56; new test for schema-cleanliness)
  • pre-commit run --all-files — clean
  • clawloop run examples/configs/math_harness.json --dry-run — completes (real MathAdapter + mock LLMs via subclass detection)
  • clawloop run examples/configs/taubench_harness.json --dry-run with all API key env vars unset — completes (stub adapter)
  • LLMClientConfig.model_json_schema() properties no longer include dry_run_role
  • LLMClientConfig(model=\"x\").model_dump() no longer includes dry_run_role key

🤖 Generated with Claude Code

@dantp-ai dantp-ai self-assigned this Jun 10, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the --dry-run mode to guarantee no real LLM or network calls are made across all environment types, including those that bypass _make_llm_client (such as taubench, entropic, and openclaw), by introducing a private _DryRunLLMClientConfig subclass and a _StubAdapter. It also adds up-front configuration validation for taubench and comprehensive tests. The feedback suggests refining the exception handling in _StubAdapter.run_episode to catch specific exceptions like AttributeError and TypeError instead of using a broad except Exception: block.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread clawloop/cli.py Outdated
dantp-ai added a commit to dantp-ai/clawloop that referenced this pull request Jun 10, 2026
…review

`except Exception:` swallowed any error from `agent_state.state_id().combined_hash`.
For the dry-run stub the only legitimate failure modes are AttributeError
(missing `state_id` / `combined_hash`) and TypeError (non-callable). Narrow
the catch so real bugs propagate instead of being silently masked.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
dantp-ai and others added 2 commits June 12, 2026 02:35
…hos#63

Replaces the public ``dry_run_role: str | None = None`` field on
``LLMClientConfig`` (introduced in PR aganthos#62) with a private subclass
``_DryRunLLMClientConfig`` defined in ``clawloop/cli.py``.

Why a subclass instead of the field:
  - The public ``LLMClientConfig`` schema no longer carries any dry-run
    testing vocabulary. ``model_dump()`` / ``model_json_schema()`` of a
    user's TrainConfig do not emit ``dry_run_role: null`` anymore. New
    test ``test_public_llm_client_config_schema_excludes_dry_run_vocab``
    locks this in.
  - ``model`` is still left untouched (the gain from PR aganthos#62 stands).
  - Pydantic ``model_copy()`` preserves the runtime class, so the role
    tag survives copies just like a field would. The regression test
    is updated to assert this end-to-end.

Implementation note: the issue originally proposed a ``WeakKeyDictionary``
side-channel. Empirically, Pydantic v2 BaseModel instances are not
hashable by default, so that approach is infeasible without making
``LLMClientConfig`` ``frozen=True`` (which would be a more invasive
public-API change). The subclass approach satisfies the same spirit
("keep dry-run vocabulary out of the public schema") via a standard
Pydantic mechanism.

Verification: pre-commit clean; 57 tests pass; smoke runs for math and
taubench dry-run still complete cleanly with no API keys set.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…review

`except Exception:` swallowed any error from `agent_state.state_id().combined_hash`.
For the dry-run stub the only legitimate failure modes are AttributeError
(missing `state_id` / `combined_hash`) and TypeError (non-callable). Narrow
the catch so real bugs propagate instead of being silently masked.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@dantp-ai dantp-ai force-pushed the feat/issue-63-dry-run-role-subclass branch from 50baf0c to f5e7544 Compare June 12, 2026 00:39
@dantp-ai dantp-ai merged commit c5faf2f into aganthos:main Jun 12, 2026
5 checks passed
dantp-ai added a commit to dantp-ai/clawloop that referenced this pull request Jun 12, 2026
…ganthos#64

`train()` now accepts keyword-only `make_llm_client` and `env_builders`
parameters; the CLI dry-run path builds them as closures over the user's
config and passes them in directly. This removes the module-level mutation
of `clawloop.train._make_llm_client` and `ENV_BUILDERS` that PR aganthos#62
introduced, and the `_DryRunLLMClientConfig` subclass that PR aganthos#65 added
to identify roles for the monkey-patched factory.

Changes
-------
- `clawloop/train.py`:
  - `train()` signature gains `*, make_llm_client=None, env_builders=None`;
    defaults preserve current behavior.
  - All six `_build_*` helpers gain `make_llm_client` as a third
    positional parameter (uniform signature). Only `_build_math`
    actually uses it today; others accept it for the future refactor
    tracked by the out-of-scope notes in the plan.
  - New module-level alias `LLMClientFactory = Callable[[LLMClientConfig], Any]`
    for readable type hints.
- `clawloop/cli.py`:
  - Delete `_DryRunLLMClientConfig` and the `LLMClientConfig` import that
    only existed to subclass it.
  - Replace `_install_dry_run_clients` (which mutated module state) with
    `_build_dry_run_overrides(config)` (pure: returns `(factory, builders)`).
  - `cmd_run` passes the overrides into `train(...)` directly.
  - `_StubAdapter` and `ENVS_USING_MAKE_LLM_CLIENT` are unchanged.
- Tests:
  - `tests/test_train_config.py::TestTrainEndToEnd::test_harness_learning_math`:
    drop `with patch("clawloop.train._make_llm_client")` and pass
    `make_llm_client=_pick_client` to `train()` directly.
  - `tests/test_cli.py`:
    - Remove the subclass round-trip test (subclass is gone).
    - Replace with `test_dry_run_overrides_map_roles_correctly` and
      `test_dry_run_overrides_stub_bypassing_env` exercising
      `_build_dry_run_overrides` directly.
    - Add `test_dry_run_does_not_mutate_module_state` — invokes
      `cmd_run` with `dry_run=True` and asserts `clawloop.train._make_llm_client`
      and `ENV_BUILDERS` are unchanged after the call. Locks in the
      no-global-mutation contract.
    - Keep `test_public_llm_client_config_schema_excludes_dry_run_vocab`.
  - `tests/unit/test_train.py`: builder invocations now pass `None` as
    the third positional arg (openspiel doesn't use it).

Verification
------------
- pre-commit clean (ruff + ruff-format)
- pytest tests/test_cli.py tests/test_train_config.py tests/unit/test_train.py → 68 pass
- Full suite: 1046 passed, 49 skipped, 0 failed
- `clawloop run examples/configs/math_harness.json --dry-run` exits 0
- `clawloop run examples/configs/taubench_harness.json --dry-run` with
  all API key env vars unset exits 0
- `clawloop eval` still exits 2 with the redirect

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
dantp-ai added a commit that referenced this pull request Jun 12, 2026
…) — closes #64 (#66)

* refactor: replace monkey-patching with dependency injection — closes #64

`train()` now accepts keyword-only `make_llm_client` and `env_builders`
parameters; the CLI dry-run path builds them as closures over the user's
config and passes them in directly. This removes the module-level mutation
of `clawloop.train._make_llm_client` and `ENV_BUILDERS` that PR #62
introduced, and the `_DryRunLLMClientConfig` subclass that PR #65 added
to identify roles for the monkey-patched factory.

Changes
-------
- `clawloop/train.py`:
  - `train()` signature gains `*, make_llm_client=None, env_builders=None`;
    defaults preserve current behavior.
  - All six `_build_*` helpers gain `make_llm_client` as a third
    positional parameter (uniform signature). Only `_build_math`
    actually uses it today; others accept it for the future refactor
    tracked by the out-of-scope notes in the plan.
  - New module-level alias `LLMClientFactory = Callable[[LLMClientConfig], Any]`
    for readable type hints.
- `clawloop/cli.py`:
  - Delete `_DryRunLLMClientConfig` and the `LLMClientConfig` import that
    only existed to subclass it.
  - Replace `_install_dry_run_clients` (which mutated module state) with
    `_build_dry_run_overrides(config)` (pure: returns `(factory, builders)`).
  - `cmd_run` passes the overrides into `train(...)` directly.
  - `_StubAdapter` and `ENVS_USING_MAKE_LLM_CLIENT` are unchanged.
- Tests:
  - `tests/test_train_config.py::TestTrainEndToEnd::test_harness_learning_math`:
    drop `with patch("clawloop.train._make_llm_client")` and pass
    `make_llm_client=_pick_client` to `train()` directly.
  - `tests/test_cli.py`:
    - Remove the subclass round-trip test (subclass is gone).
    - Replace with `test_dry_run_overrides_map_roles_correctly` and
      `test_dry_run_overrides_stub_bypassing_env` exercising
      `_build_dry_run_overrides` directly.
    - Add `test_dry_run_does_not_mutate_module_state` — invokes
      `cmd_run` with `dry_run=True` and asserts `clawloop.train._make_llm_client`
      and `ENV_BUILDERS` are unchanged after the call. Locks in the
      no-global-mutation contract.
    - Keep `test_public_llm_client_config_schema_excludes_dry_run_vocab`.
  - `tests/unit/test_train.py`: builder invocations now pass `None` as
    the third positional arg (openspiel doesn't use it).

Verification
------------
- pre-commit clean (ruff + ruff-format)
- pytest tests/test_cli.py tests/test_train_config.py tests/unit/test_train.py → 68 pass
- Full suite: 1046 passed, 49 skipped, 0 failed
- `clawloop run examples/configs/math_harness.json --dry-run` exits 0
- `clawloop run examples/configs/taubench_harness.json --dry-run` with
  all API key env vars unset exits 0
- `clawloop eval` still exits 2 with the redirect

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix: replace None with lambda dummies for builder make_llm_client arg

Per gemini-code-assist review on PR #66: passing `None` for the
`make_llm_client` parameter violates the declared type signature
`LLMClientFactory = Callable[[LLMClientConfig], Any]`. The builders we
invoke (openspiel, taubench stub) never call the factory, but the type
contract still applies.

Replace each `None` with `lambda _: None` — a callable that ignores its
arg and returns None. Satisfies the type signature; behavior is
unchanged because the callees don't invoke it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move dry_run_role off public LLMClientConfig schema

1 participant