Skip to content

feat(pack): codehub replay — decision-equivalence structural check (Move 6)#270

Merged
theagenticguy merged 3 commits into
mainfrom
feat/replay-decision-equivalence
Jun 30, 2026
Merged

feat(pack): codehub replay — decision-equivalence structural check (Move 6)#270
theagenticguy merged 3 commits into
mainfrom
feat/replay-decision-equivalence

Conversation

@theagenticguy

Copy link
Copy Markdown
Owner

What

Implements spec 011 / ADR 0020 — the structural half of Move 6. codehub replay --compare <pack-a> <pack-b> asserts two code-packs are decision-equivalent: the same files + byte ranges selected under the same budget, regardless of incidental drift (tokenCount, pins, chunk text bytes, fileHash). It's the structural counterpart to the Move 2 variance probe — the probe shows the pack helps behaviorally; replay shows the pack is what we claim structurally. Together they're the data-backed "how well does OCH do" story.

This PR carries spec 011 + ADR 0020 + the implementation in one diff (per the approved plan).

The contract pivot (ADR 0020)

Byte-identity (packHash) was the contract (ROADMAP U1). It's brittle: the packHash preimage binds pins.chonkieVersion, pins.grammarCommits, and per-file fileHash, so a toolchain bump flips the hash even when the same bytes were selected. ADR 0020 makes decision-equivalence the contract of record and byte-identity a sufficient witness — the existing graphHash/packHash gates stay unchanged as the cheap fast path (no gate relaxed). The ADR also corrects the embedder-swap framing: embeddings aren't in the pack and graphHash is embedder-neutral by design, so the #252 swap hits the index, not the pack/graph hash.

@opencodehub/pack — decision-set projection

  • decisionSetFromChunks / decisionSetFromByteRanges — project ast-chunks (path,startByte,endByte) or context-bom byteRanges to a normalized, incidental-free (path, mergedByteRanges, budgetTokens) set.
  • decisionHash = sha256(canonicalJson(decisionSet)) — same RFC 8785 machinery as packHash. A tokenCount-only drift hashes identically (proven in tests).
  • diffDecisionSets — structured diff (onlyInA / onlyInB / rangeDeltas) for the actionable DIVERGED output.

CLI — codehub replay --compare A B [--json] [--budget-strict]

Tiered (R8), reusing the byte-witness design from the unmerged e6a81c2 replay with the comparator swapped to decision-set:

  1. Integrity — re-hash every BOM body vs its attested fileHash; a tampered pack is CORRUPT (refuse to compare).
  2. packHash fast path (R3) — equal packHashEQUIVALENT without projecting.
  3. Decision-equivalence — project + compare; different budgets ⇒ BUDGET_MISMATCH (R5).

Verdicts EQUIVALENT / DIVERGED / BUDGET_MISMATCH / CORRUPT with exit codes; --budget-strict promotes a budget mismatch to failure. The manifest parser is corrected for schema 2 (ADR 0019): no duckdb_version pin, reads budget_tokens. The --json record is a pure function of the inputs (no clock/run-id, R6).

Scope / deferrals

  • replay <hash> --repack self-check → v2 behind the same machinery (needs a checkout + re-pack RepackDriver). Two-pack --compare is the v1 unit that proves the projection.
  • Not a CI gate in v1 — on-demand structural check (spec 011 Q2).

Validation

  • biome ci . ✓ (713 files, 0 errors)
  • tsc -b full workspace ✓
  • full build ✓ — @opencodehub/pack (incl. decision-set) inlined into the CLI bundle, 0 surviving external imports
  • full test suite ✓ — all 18 packages fail 0; +29 new tests (14 pack decision-set, 15 CLI replay)
  • pre-commit (banned-strings, commitlint) + pre-push (verdict, typecheck, test) hooks ✓

🤖 Generated with Claude Code

Drafts the structural half of Move 6 for review (no code yet).

Spec 011 (.erpaval/specs/011-replay-decision-equivalence/spec.md):
`codehub replay` asserts decision-equivalence — same inputs ⇒ same
retrieval decision set (same files + byte ranges selected under the same
budget) — via a `decisionHash` that projects ast-chunks + context-bom
byteRanges and excludes incidental fields (tokenCount, pins, chunk text,
fileHash). Byte-identity becomes the cheap sufficient witness, not the
contract. Supersedes the byte-identity comparator in the unmerged
e6a81c2 replay, reusing its integrity/recompute tiers. 5 open questions.

ADR 0020: decision-equivalence is the contract of record; the existing
graphHash/packHash byte-identity gates stay as the witness fast path
(no gate relaxed here). Corrects the embedder-swap framing — embeddings
aren't in the pack and graphHash is embedder-neutral; the swap hits the
index, not packHash/graphHash. Pairs with the Move 2 variance probe as
the data-backed "how well does OCH do" story.
…ove 6)

Implements spec 011 / ADR 0020 (the structural half of Move 6). `codehub
replay --compare <pack-a> <pack-b>` asserts two packs are decision-
equivalent: same files + byte ranges selected under the same budget,
regardless of incidental drift (tokenCount, pins, chunk text bytes,
fileHash). Byte-identity (packHash) stays the cheap sufficient witness;
a decisionHash projection is the contract of record.

@opencodehub/pack — new decision-set module:
  - decisionSetFromChunks / decisionSetFromByteRanges: project ast-chunks
    (path,startByte,endByte) or context-bom byteRanges to a normalized,
    incidental-free (path, mergedByteRanges, budget) set.
  - decisionHash = sha256(canonicalJson(decisionSet)) — same RFC 8785
    machinery as packHash; tokenCount-only drift is decision-equivalent.
  - diffDecisionSets: structured diff (onlyInA / onlyInB / rangeDeltas)
    for the actionable DIVERGED output.

CLI — codehub replay --compare A B [--json] [--budget-strict]:
  - Tiers (R8): integrity (re-hash BOM bodies vs attested fileHash) →
    packHash fast path (R3) → decision-equivalence projection.
  - Verdict: EQUIVALENT / DIVERGED / BUDGET_MISMATCH / CORRUPT, with exit
    codes; --budget-strict promotes BUDGET_MISMATCH to failure.
  - Manifest parser corrected for schema 2 (ADR 0019): no duckdb_version
    pin, reads budget_tokens. Reuses the byte-witness tier design from the
    unmerged e6a81c2 replay, swapping the comparator to decision-set.
  - --json record is a pure function of the inputs (no clock/run-id, R6).

omnigent-style self-check (replay <hash> --repack) deferred to v2 per the
approved spec; two-pack compare is the v1 unit that proves the projection.

Spec 011 + ADR 0020 carried on this branch. +29 tests (14 pack, 15 CLI).
`runReplayCompare` calls `resolve(dir)` before the injected `_loadPack`,
so the resolved path is platform-dependent — on Windows the POSIX
`/fake/hashA` fixture key became `C:\fake\hashA` and the map lookup
missed, throwing in all five seamed comparator tests. The loads are
sequential (A then B), so the fake now serves packs in call order
instead of keying on the unstable resolved path. Real cross-platform
bug in the test harness, not a flake.
@theagenticguy theagenticguy merged commit f97b417 into main Jun 30, 2026
38 checks passed
@theagenticguy theagenticguy deleted the feat/replay-decision-equivalence branch June 30, 2026 13:01
@github-actions github-actions Bot mentioned this pull request Jun 30, 2026
theagenticguy pushed a commit that referenced this pull request Jun 30, 2026
🤖 Automated release via release-please
---


<details><summary>root: 0.10.5</summary>

##
[0.10.5](root-v0.10.4...root-v0.10.5)
(2026-06-30)


### Features

* **eval:** pack --variance-probe — measure the variance an OCH pack
removes (Move 2)
([#269](#269))
([278702a](278702a))
* **frameworks:** wire stage-5 import/SCIP detection into the profile
phase ([#267](#267))
([6b4d122](6b4d122))
* **pack:** codehub replay — decision-equivalence structural check (Move
6) ([#270](#270))
([f97b417](f97b417))
</details>

<details><summary>cli: 0.10.5</summary>

##
[0.10.5](cli-v0.10.4...cli-v0.10.5)
(2026-06-30)


### Features

* **eval:** pack --variance-probe — measure the variance an OCH pack
removes (Move 2)
([#269](#269))
([278702a](278702a))
* **pack:** codehub replay — decision-equivalence structural check (Move
6) ([#270](#270))
([f97b417](f97b417))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant