Add agent-discoverability contract test (#461)#467
Conversation
New `tests/test_agent_discoverability.py` pins the agent-facing surface introduced by PR #464 against future regression. Snapshot/static assertions only — no live API calls, no subprocess, runs in the default pytest suite. What's locked: 1. `__all__` membership of agent_workflow / profile_panel / get_llm_guide / practitioner_next_steps / BusinessReport (catches export pruning). 2. `dir(diff_diff)` head-first ordering matches `_AGENT_FACING_ORDER` (catches drift in `_OrderedName.__lt__` or `__dir__()` regression). 3. `dir()` tail stays alphabetic when keyed by `str` (recovery key for downstream tooling that re-sorts). 4. `dir()` returns the FULL module namespace, not just `__all__` (preserves `__doc__` / `__name__` / `__file__` for `inspect.getmembers` consumers). 5. `_OrderedName` invariants: `isinstance(_, str)` holds, str methods work (upper, eq, hash, `in`, f-string). 6. Top-level `__doc__` first non-blank paragraph names `agent_workflow`; full doc text names the 4 downstream primitives. 7. `agent_workflow()` output script references each canonical helper by name; every `fit_candidates` entry resolves on the diff_diff namespace. 8. Canonical estimator class names (CallawaySantAnna, ChaisemartinDHaultfoeuille, ContinuousDiD, DifferenceInDifferences, HeterogeneousAdoptionDiD, HonestDiD, ImputationDiD, PreTrendsPower, SunAbraham, TwoWayFixedEffects, WooldridgeDiD) remain importable. 9. Each agent-facing entrypoint stays callable. 17 tests (12 standalone + 5 parametrize cells over the agent-facing entrypoint names). Closes #461 (snapshot variant). The live-agent regression test remains a follow-up that depends on causal-llm-eval packaging its harness module. Also closes the `__dir__()` contract-test row from PR #464's TODO.md (deferred there, landed here). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment ✅ Looks good No unmitigated P0/P1 findings. This PR only adds a contract test plus Executive Summary
Methodology No findings. The diff only touches Code Quality No findings. Performance No findings. Maintainability No findings beyond the test-coverage gaps noted below. Tech Debt
Security No findings. Documentation/Tests
Verification note: static review only; I could not run the new test file in this environment because |
…overage) CI codex flagged 2 actionable items on the initial push: P2 (Tech Debt): `test_getmembers_parity_with_default_module_dir()` was tautological. CPython's `inspect.getmembers()` derives its name list from `dir(obj)`, so the `dir_names == gm_names` check was always true regardless of how narrow `__dir__()` became. The "full namespace" contract advertised in the CHANGELOG was therefore not actually pinned. Fix: split into two tests with independent assertions: - `test_dir_returns_full_module_namespace` now compares `set(map(str, dir(diff_diff)))` against `set(vars(diff_diff))` — vars() reads the real underlying module dict and `__dir__` does NOT derive from it automatically, so this catches a regression where `__dir__` is reduced to `__all__`. - `test_getmembers_returns_accessible_values` retains the `getattr` + `__doc__` accessibility check as a secondary contract (every reported name must actually resolve to a value). P3 (Docs/Tests): `_AGENT_FACING_ORDER` includes `DiagnosticReport`, but the explicit `__all__` and callability tests covered only 5 of the 6 names. A regression that left `DiagnosticReport` in `dir()` but de-exported it would not have failed. Fix: anchor both assertions to `_AGENT_FACING_ORDER` itself rather than duplicating a hard-coded list: - `test_agent_facing_names_in_all` now derives `required` from `_AGENT_FACING_ORDER`. - `test_agent_facing_entrypoint_callable` parametrizes over `sorted(_AGENT_FACING_ORDER)` directly. Future head-name additions automatically extend coverage instead of needing two sources of truth. Tests: 17 → 18 (DiagnosticReport added to callable parametrize). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
tests/test_agent_discoverability.py— static snapshot test that pins the agent-facing surface introduced by PR Surface agent_workflow() + curated dir() for LLM discoverability (#460) #464 against future regression. 17 assertions across 11 test functions (12 standalone + 5 parametrize cells over the agent-facing entrypoint names). Snapshot-only: no live API calls, no subprocess, no live agents — runs in the default pytest suite.__dir__()contract-test row fromTODO.mdthat PR Surface agent_workflow() + curated dir() for LLM discoverability (#460) #464 deferred here.What's locked by the new test file:
__all__membership of the 5 agent-facing primitives (agent_workflow,profile_panel,get_llm_guide,practitioner_next_steps,BusinessReport)dir(diff_diff)head matches_AGENT_FACING_ORDERin declared order — anchors to the tuple, not a slice length, so trims/expansions to the head are trackeddir()tail stays alphabetic when keyed bystr(recovery key for tooling that re-sorts)dir()returns the full module namespace, not just__all__— preserves__doc__/__name__/__file__forinspect.getmembersconsumers_OrderedNameinvariants:isinstance(_, str), str methods (.upper,==, hash for dict keys,in, f-string)inspect.getmembers(diff_diff)parity withdir(diff_diff)__doc__first non-blank paragraph namesagent_workflow; the full doc names all 4 downstream primitivesagent_workflow()output script references each canonical helper by name; everyfit_candidatesentry resolves on the diff_diff namespaceCloses #461 (snapshot variant). The live-agent regression test (spawning a cold-start Claude subprocess against a staggered-DiD task) remains a follow-up that depends on
causal-llm-evalpackaging its harness module — that path will be tracked as its own follow-up issue if not already.Methodology references (required if estimator / math changes)
Validation
tests/test_agent_discoverability.py(new, 17 tests). All pass locally; existingtests/test_agent_workflow.py(23) andtests/test_guides.py(41) remain green for a combined 81/81.Security / privacy
Generated with Claude Code