feat(pack): codehub replay — decision-equivalence structural check (Move 6)#270
Merged
Conversation
Drafts the structural half of Move 6 for review (no code yet). Spec 011 (.erpaval/specs/011-replay-decision-equivalence/spec.md): `codehub replay` asserts decision-equivalence — same inputs ⇒ same retrieval decision set (same files + byte ranges selected under the same budget) — via a `decisionHash` that projects ast-chunks + context-bom byteRanges and excludes incidental fields (tokenCount, pins, chunk text, fileHash). Byte-identity becomes the cheap sufficient witness, not the contract. Supersedes the byte-identity comparator in the unmerged e6a81c2 replay, reusing its integrity/recompute tiers. 5 open questions. ADR 0020: decision-equivalence is the contract of record; the existing graphHash/packHash byte-identity gates stay as the witness fast path (no gate relaxed here). Corrects the embedder-swap framing — embeddings aren't in the pack and graphHash is embedder-neutral; the swap hits the index, not packHash/graphHash. Pairs with the Move 2 variance probe as the data-backed "how well does OCH do" story.
…ove 6)
Implements spec 011 / ADR 0020 (the structural half of Move 6). `codehub
replay --compare <pack-a> <pack-b>` asserts two packs are decision-
equivalent: same files + byte ranges selected under the same budget,
regardless of incidental drift (tokenCount, pins, chunk text bytes,
fileHash). Byte-identity (packHash) stays the cheap sufficient witness;
a decisionHash projection is the contract of record.
@opencodehub/pack — new decision-set module:
- decisionSetFromChunks / decisionSetFromByteRanges: project ast-chunks
(path,startByte,endByte) or context-bom byteRanges to a normalized,
incidental-free (path, mergedByteRanges, budget) set.
- decisionHash = sha256(canonicalJson(decisionSet)) — same RFC 8785
machinery as packHash; tokenCount-only drift is decision-equivalent.
- diffDecisionSets: structured diff (onlyInA / onlyInB / rangeDeltas)
for the actionable DIVERGED output.
CLI — codehub replay --compare A B [--json] [--budget-strict]:
- Tiers (R8): integrity (re-hash BOM bodies vs attested fileHash) →
packHash fast path (R3) → decision-equivalence projection.
- Verdict: EQUIVALENT / DIVERGED / BUDGET_MISMATCH / CORRUPT, with exit
codes; --budget-strict promotes BUDGET_MISMATCH to failure.
- Manifest parser corrected for schema 2 (ADR 0019): no duckdb_version
pin, reads budget_tokens. Reuses the byte-witness tier design from the
unmerged e6a81c2 replay, swapping the comparator to decision-set.
- --json record is a pure function of the inputs (no clock/run-id, R6).
omnigent-style self-check (replay <hash> --repack) deferred to v2 per the
approved spec; two-pack compare is the v1 unit that proves the projection.
Spec 011 + ADR 0020 carried on this branch. +29 tests (14 pack, 15 CLI).
`runReplayCompare` calls `resolve(dir)` before the injected `_loadPack`, so the resolved path is platform-dependent — on Windows the POSIX `/fake/hashA` fixture key became `C:\fake\hashA` and the map lookup missed, throwing in all five seamed comparator tests. The loads are sequential (A then B), so the fake now serves packs in call order instead of keying on the unstable resolved path. Real cross-platform bug in the test harness, not a flake.
Merged
theagenticguy
pushed a commit
that referenced
this pull request
Jun 30, 2026
🤖 Automated release via release-please --- <details><summary>root: 0.10.5</summary> ## [0.10.5](root-v0.10.4...root-v0.10.5) (2026-06-30) ### Features * **eval:** pack --variance-probe — measure the variance an OCH pack removes (Move 2) ([#269](#269)) ([278702a](278702a)) * **frameworks:** wire stage-5 import/SCIP detection into the profile phase ([#267](#267)) ([6b4d122](6b4d122)) * **pack:** codehub replay — decision-equivalence structural check (Move 6) ([#270](#270)) ([f97b417](f97b417)) </details> <details><summary>cli: 0.10.5</summary> ## [0.10.5](cli-v0.10.4...cli-v0.10.5) (2026-06-30) ### Features * **eval:** pack --variance-probe — measure the variance an OCH pack removes (Move 2) ([#269](#269)) ([278702a](278702a)) * **pack:** codehub replay — decision-equivalence structural check (Move 6) ([#270](#270)) ([f97b417](f97b417)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Implements spec 011 / ADR 0020 — the structural half of Move 6.
codehub replay --compare <pack-a> <pack-b>asserts two code-packs are decision-equivalent: the same files + byte ranges selected under the same budget, regardless of incidental drift (tokenCount,pins, chunk text bytes,fileHash). It's the structural counterpart to the Move 2 variance probe — the probe shows the pack helps behaviorally;replayshows the pack is what we claim structurally. Together they're the data-backed "how well does OCH do" story.This PR carries spec 011 + ADR 0020 + the implementation in one diff (per the approved plan).
The contract pivot (ADR 0020)
Byte-identity (
packHash) was the contract (ROADMAP U1). It's brittle: thepackHashpreimage bindspins.chonkieVersion,pins.grammarCommits, and per-filefileHash, so a toolchain bump flips the hash even when the same bytes were selected. ADR 0020 makes decision-equivalence the contract of record and byte-identity a sufficient witness — the existinggraphHash/packHashgates stay unchanged as the cheap fast path (no gate relaxed). The ADR also corrects the embedder-swap framing: embeddings aren't in the pack andgraphHashis embedder-neutral by design, so the #252 swap hits the index, not the pack/graph hash.@opencodehub/pack— decision-set projectiondecisionSetFromChunks/decisionSetFromByteRanges— projectast-chunks(path,startByte,endByte)or context-bombyteRangesto a normalized, incidental-free(path, mergedByteRanges, budgetTokens)set.decisionHash = sha256(canonicalJson(decisionSet))— same RFC 8785 machinery aspackHash. AtokenCount-only drift hashes identically (proven in tests).diffDecisionSets— structured diff (onlyInA/onlyInB/rangeDeltas) for the actionableDIVERGEDoutput.CLI —
codehub replay --compare A B [--json] [--budget-strict]Tiered (R8), reusing the byte-witness design from the unmerged
e6a81c2replay with the comparator swapped to decision-set:fileHash; a tampered pack isCORRUPT(refuse to compare).packHash⇒EQUIVALENTwithout projecting.BUDGET_MISMATCH(R5).Verdicts
EQUIVALENT/DIVERGED/BUDGET_MISMATCH/CORRUPTwith exit codes;--budget-strictpromotes a budget mismatch to failure. The manifest parser is corrected for schema 2 (ADR 0019): noduckdb_versionpin, readsbudget_tokens. The--jsonrecord is a pure function of the inputs (no clock/run-id, R6).Scope / deferrals
replay <hash> --repackself-check → v2 behind the same machinery (needs a checkout + re-packRepackDriver). Two-pack--compareis the v1 unit that proves the projection.Validation
biome ci .✓ (713 files, 0 errors)tsc -bfull workspace ✓@opencodehub/pack(incl. decision-set) inlined into the CLI bundle, 0 surviving external importsfail 0; +29 new tests (14 pack decision-set, 15 CLI replay)🤖 Generated with Claude Code