Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions .erpaval/specs/011-replay-decision-equivalence/spec.md

Large diffs are not rendered by default.

131 changes: 131 additions & 0 deletions docs/adr/0020-decision-equivalence-supersedes-byte-identity.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# ADR 0020 — Decision-equivalence is the pack contract; byte-identity is a witness, not the contract

- Status: **Proposed** — 2026-06-30 (awaiting Laith's review; pairs with spec 011).
- Authors: Laith Al-Saadoon + Bonk.
- Branch: `spec/011-replay-decision-equivalence`.
- Amends (does not supersede): the byte-identity invariant asserted in
[ADR 0011 — Graph database backend](./0011-graph-db-backend.md) (the `graphHash`
invariant) and [ADR 0019 — Single-file SQLite storage](./0019-single-file-sqlite-storage.md)
(the "graphHash byte-identity (the go/no-go)" gate), and the ROADMAP U1/U2
determinism constraints. Those gates **stay** — this ADR reframes what they
are *for*. It also supersedes the byte-identity comparator in the unmerged
`codehub replay` (`feat/v1-distribution-breadth`, `e6a81c2`).

## Context

The pack's reproducibility promise has been **byte-identity**: same inputs ⇒
byte-identical artifact, witnessed by a hash. The chain:

- **ROADMAP U1/U2** name "graphHash byte-identity per commit" and "deterministic
code-pack (same commit + tokenizer + budget → same bytes)" as the one
breaking-change budget OCH must preserve (`.erpaval/ROADMAP.md:201-202,219`).
- **ADR 0011** defines `graphHash` as the SHA-256 of the canonical-JSON
`{edges, nodes}` projection and gates it in CI; **ADR 0019** makes
byte-identical rebuild the migration go/no-go.
- The pack inherits it: `packHash = sha256(canonicalJson(manifest))`
(`packages/pack/src/manifest.ts:52`), and `pack-determinism.test.ts` asserts
two runs produce byte-identical BOM files.
- The user-facing promise (`packages/pack/src/readme.ts:73`): *"same
`(commit, tokenizer_id, budget_tokens, chonkie_version, grammar_commits)`
produces a byte-identical pack and the same `pack_hash`."*

Byte-identity is a good *witness* but the wrong *contract*, because the bytes
bind things the auditor does not care about:

1. **The `packHash` preimage includes incidental fields.** `pins.chonkieVersion`,
`pins.grammarCommits`, and every BOM file's `fileHash` are in the hash
(`manifest.ts:82-101`). A chonkie bump, a grammar-pin refresh, or a
`tokenCount` recompute flips `packHash` — even when the same byte ranges of
the same files were selected under the same budget. `readme.ts:73` literally
lists `chonkie_version` and `grammar_commits` as pack inputs, conceding that
a toolchain bump yields a "different" pack. The retrieval decision was
identical.

2. **The embedder-swap precedent, stated precisely.** The #252 embedder swap
(gte-modernbert → F2LLM-v2-80M, 320-dim) is the canonical decision-irrelevant
change. Precision matters because the motivating prose (spec 010 §0)
over-stated the mechanism: embeddings are **not** in the pack — the Parquet
sidecar was dropped in ADR 0019, the BOM is **8 items**, and `graphHash` is
embedder-neutral by construction (ADR 0014: it hashes only `{nodes, edges}`,
never `store_meta`). So the swap breaks **neither** `packHash` **nor**
`graphHash` today; it invalidates the `embeddings` table and the `store_meta`
embedder fingerprint, forcing a re-index. The general lesson holds regardless:
a legitimate change to *how* OCH builds the index — a better embedder, a newer
grammar, a re-tokenizer — is exactly what a naive "did the bytes change?"
check misreads as "the pack changed," when which files/ranges the agent saw is
identical.

3. **An auditor cares about the decision, not the bytes.** They want: did the
agent's context come from the right places? Byte-identity over-promises
(asserts more than the contract needs) and under-delivers (breaks on changes
the contract should tolerate).

## Decision

**The pack contract is decision-equivalence. Byte-identity is one sufficient
witness of it, not the contract itself.**

- **Contract of record (decision-equivalence):** two packs built from the same
inputs are equivalent iff they have the same **decision set** — the same
`(path, mergedByteRanges)` selections under the same `budgetTokens` —
regardless of `tokenCount`, `pins`, chunk text bytes, or serialization.
- **Witness (byte-identity):** `packHash` equality ⇒ decision-equivalence
(matching bytes trivially match the decision). The existing `graphHash` /
`packHash` byte-identity gates **stay** as the cheap fast-path witness — they
are valuable and almost-free. They are reframed from "the contract" to "a
sufficient condition for satisfying the contract."
- **The decision set is a projection of existing artifacts**, not a new shape.
It is computed from `ast-chunks.jsonl` (`{path, startByte, endByte}` per chunk,
`ast-chunker.ts:68`) with `context-bom.json`'s merged `byteRanges`
(`context-bom.ts:170`) as the fallback/cross-check.
- **`decisionHash`** is `sha256(canonicalJson(decisionSet))`, using the same
RFC 8785 `canonicalJson` helper as `packHash`. It deliberately **excludes**
`tokenCount`, `pins`, chunk text bytes, and per-file `fileHash`; it
**includes** `path`, merged byte ranges, and `budgetTokens`.
- **`codehub replay`** is the structural assertion tool (spec 011): it compares
two packs' decision sets (or re-packs and compares against a stored pack),
reporting `EQUIVALENT` / `DIVERGED` / `BUDGET_MISMATCH` with a structured diff.
It supersedes the byte-identity comparator in the unmerged `e6a81c2` `replay`,
reusing that branch's integrity + recompute tiers as the byte-witness fast
path and swapping only the re-pack comparator.
- **No gate is relaxed in this ADR.** The byte-identity CI gates continue to run
unchanged. Decision-equivalence is *added* as the contract they serve. Whether
to later let a pins-only delta pass the determinism gate (treating it as
decision-equivalent) is an explicit follow-up, not decided here (spec 011 Q3).

## Consequences

**Positive.**

- The reproducibility claim becomes one OCH can defend against legitimate
toolchain evolution: "upgrade the chunker, swap the embedder, bump a grammar —
the pack's *decision* is provably unchanged," with `codehub replay` as the
receipt. This is the data-backed "how well does OCH do" story paired with the
Move 2 variance probe.
- The contract stops over-promising. A grammar-pin bump no longer counts as "the
pack changed" to an auditor reading a hash.
- `replay`'s diff output is actionable in a way a hash inequality never was: it
names *which files/ranges the agent would have seen differently*.

**Negative / costs.**

- A second hash (`decisionHash`) and a projection to maintain alongside
`packHash`. Mitigated: the projection is pure and small, lives in
`@opencodehub/pack` beside the builders, and reuses `canonicalJson`.
- Two notions of "same pack" (byte-identical vs decision-equivalent) is a concept
an operator must learn. Mitigated: `packHash` stays the default identity in
paths/UX; `decisionHash` surfaces only through `replay`.
- The `ast-chunks` offsets are UTF-16 code-unit indices today
(`ast-chunker.ts:30`), not true UTF-8 byte offsets (coincide for ASCII).
Decision-equivalence is well-defined as long as both packs use the same
convention (they do); a future promotion to true byte offsets is a
cross-cutting change tracked separately.

**Follow-ups (not decided here).**

- Whether to relax the byte-identity CI gates to accept decision-equivalent
packs (e.g. a pins-only delta) — spec 011 Q3.
- Whether `replay` becomes an `analyze`-time or CI assertion vs. staying
on-demand — spec 011 Q2.
- Doc-drift cleanup: the ROADMAP and `code-pack` CLI description still say
"9-item BOM"; it has been 8 since ADR 0019.
255 changes: 255 additions & 0 deletions packages/cli/src/commands/replay.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,255 @@
/**
* Tests for `codehub replay --compare` (decision-equivalence).
*
* Strategy: the comparator (`runReplayCompare`) is exercised via the
* `_loadPack` seam with hand-built `LoadedPack`s — no filesystem. `loadPack`
* itself is tested against a real on-disk pack directory (manifest +
* ast-chunks + context-bom) so the snake_case parsing + integrity tier + the
* JSONL/CycloneDX parsers are covered end-to-end.
*/

import { strict as assert } from "node:assert";
import { createHash } from "node:crypto";
import { mkdtemp, rm, writeFile } from "node:fs/promises";
import { tmpdir } from "node:os";
import { join } from "node:path";
import { after, before, describe, it } from "node:test";
import {
type LoadedPack,
loadPack,
packDecisionSet,
replayVerdictLine,
runReplayCompare,
serializeReplayRecord,
} from "./replay.js";

const sha = (s: string) => createHash("sha256").update(s).digest("hex");

/** Build a LoadedPack with given chunks; manifest packHash defaults distinct. */
function pack(over: Partial<LoadedPack> & { packHash: string; budget: number }): LoadedPack {
return {
dir: `/fake/${over.packHash}`,
manifest: {
packHash: over.packHash,
budgetTokens: over.budget,
commit: "c0ffee",
files: [],
},
chunks: over.chunks ?? [],
byteRangesByPath: over.byteRangesByPath ?? new Map(),
integrityDrift: over.integrityDrift ?? [],
};
}

const chunk = (path: string, startByte: number, endByte: number) => ({ path, startByte, endByte });

describe("runReplayCompare (seamed)", () => {
async function compare(a: LoadedPack, b: LoadedPack) {
// `runReplayCompare` calls `resolve(dir)` before the loader, so the
// resolved path is platform-dependent (POSIX vs Windows). It always loads
// A then B sequentially, so the fake serves packs in call order rather than
// keying on the (unstable) resolved path.
const queue = [a, b];
return runReplayCompare(a.dir, b.dir, {
_loadPack: async () => {
const p = queue.shift();
if (p === undefined) throw new Error("fake loader called more than twice");
return p;
},
});
}

it("EQUIVALENT via packHash fast path when hashes match (no projection needed)", async () => {
const a = pack({ packHash: "same", budget: 100, chunks: [chunk("a.ts", 0, 10)] });
const b = pack({ packHash: "same", budget: 100, chunks: [chunk("a.ts", 0, 99)] });
const r = await compare(a, b);
assert.equal(r.verdict, "EQUIVALENT");
assert.equal(r.decisionHashA, undefined, "fast path skips the projection");
});

it("EQUIVALENT when packHashes differ but the decision set matches (the contract)", async () => {
// Same selection, different incidental bytes → different packHash, same decision.
const a = pack({ packHash: "hashA", budget: 100, chunks: [chunk("a.ts", 0, 10)] });
const b = pack({ packHash: "hashB", budget: 100, chunks: [chunk("a.ts", 0, 10)] });
const r = await compare(a, b);
assert.equal(r.verdict, "EQUIVALENT");
assert.equal(r.decisionHashA, r.decisionHashB, "decision hashes match");
});

it("DIVERGED with a structured diff when selections differ", async () => {
const a = pack({ packHash: "hashA", budget: 100, chunks: [chunk("a.ts", 0, 10)] });
const b = pack({ packHash: "hashB", budget: 100, chunks: [chunk("a.ts", 0, 20)] });
const r = await compare(a, b);
assert.equal(r.verdict, "DIVERGED");
assert.ok(r.diff !== undefined);
assert.equal(r.diff?.rangeDeltas[0]?.path, "a.ts");
assert.notEqual(r.decisionHashA, r.decisionHashB);
});

it("BUDGET_MISMATCH when budgets differ (reported distinctly, not DIVERGED)", async () => {
const a = pack({ packHash: "hashA", budget: 100, chunks: [chunk("a.ts", 0, 10)] });
const b = pack({ packHash: "hashB", budget: 200, chunks: [chunk("a.ts", 0, 10)] });
const r = await compare(a, b);
assert.equal(r.verdict, "BUDGET_MISMATCH");
assert.equal(r.budgetA, 100);
assert.equal(r.budgetB, 200);
});

it("CORRUPT when either pack has integrity drift (refuses to compare)", async () => {
const a = pack({ packHash: "hashA", budget: 100, integrityDrift: ["ast-chunks.jsonl"] });
const b = pack({ packHash: "hashB", budget: 100, chunks: [chunk("a.ts", 0, 10)] });
const r = await compare(a, b);
assert.equal(r.verdict, "CORRUPT");
assert.deepEqual(r.corruptItems, ["ast-chunks.jsonl"]);
});

it("falls back to context-bom byteRanges when ast-chunks is empty (R7)", async () => {
const a = pack({
packHash: "hashA",
budget: 100,
byteRangesByPath: new Map([["a.ts", [{ start: 0, end: 10 }]]]),
});
const b = pack({ packHash: "hashB", budget: 100, chunks: [chunk("a.ts", 0, 10)] });
const r = await compare(a, b);
assert.equal(r.verdict, "EQUIVALENT", "byteRanges fallback == equivalent chunks");
});
});

describe("replayVerdictLine exit codes", () => {
const base = { packHashA: "a", packHashB: "b", budgetA: 100, budgetB: 100 } as const;

it("EQUIVALENT → exit 0", () => {
assert.equal(replayVerdictLine({ verdict: "EQUIVALENT", ...base }, false).exitCode, 0);
});
it("DIVERGED → exit 1", () => {
assert.equal(replayVerdictLine({ verdict: "DIVERGED", ...base }, false).exitCode, 1);
});
it("CORRUPT → exit 1", () => {
assert.equal(
replayVerdictLine({ verdict: "CORRUPT", ...base, corruptItems: ["x"] }, false).exitCode,
1,
);
});
it("BUDGET_MISMATCH → exit 0 by default, 1 under --budget-strict", () => {
const r = { verdict: "BUDGET_MISMATCH", ...base, budgetB: 200 } as const;
assert.equal(replayVerdictLine(r, false).exitCode, 0);
assert.equal(replayVerdictLine(r, true).exitCode, 1);
});
});

describe("serializeReplayRecord (R6 determinism)", () => {
it("is byte-identical across calls and carries no clock/run-id", () => {
const r = {
verdict: "DIVERGED" as const,
packHashA: "a",
packHashB: "b",
decisionHashA: "da",
decisionHashB: "db",
budgetA: 100,
budgetB: 100,
diff: { equivalent: false, onlyInA: ["x.ts"], onlyInB: [], rangeDeltas: [] },
};
const j1 = serializeReplayRecord(r);
const j2 = serializeReplayRecord(r);
assert.equal(j1, j2);
assert.ok(!j1.includes("timestamp") && !j1.includes("Date"));
});
});

describe("packDecisionSet (projection precedence)", () => {
it("prefers ast-chunks over context-bom byteRanges", () => {
const p = pack({
packHash: "h",
budget: 100,
chunks: [chunk("a.ts", 0, 10)],
byteRangesByPath: new Map([["zzz.ts", [{ start: 0, end: 999 }]]]),
});
const set = packDecisionSet(p);
assert.deepEqual(
set.selections.map((s) => s.path),
["a.ts"],
"ast-chunks wins; context-bom ignored when chunks present",
);
});
});

describe("loadPack (real on-disk)", () => {
let dir: string;
before(async () => {
dir = await mkdtemp(join(tmpdir(), "och-replay-pack-"));
// ast-chunks.jsonl — one canonical-JSON AstChunk per line.
const astChunks = [
JSON.stringify({ path: "a.ts", startByte: 0, endByte: 10, tokenCount: 3 }),
JSON.stringify({ path: "a.ts", startByte: 10, endByte: 20, tokenCount: 2 }),
"",
].join("\n");
await writeFile(join(dir, "ast-chunks.jsonl"), astChunks, "utf8");
// context-bom.json — CycloneDX with an opencodehub:byteRanges property.
const contextBom = JSON.stringify({
bomFormat: "CycloneDX",
specVersion: "1.6",
components: [
{
type: "file",
name: "a.ts",
properties: [{ name: "opencodehub:byteRanges", value: JSON.stringify([[0, 20]]) }],
},
],
});
await writeFile(join(dir, "context-bom.json"), contextBom, "utf8");
// manifest.json — snake_case wire form. fileHashes match the bodies above.
const manifest = JSON.stringify({
budget_tokens: 100,
commit: "c0ffee",
determinism_class: "strict",
files: [
{ kind: "ast-chunks", path: "ast-chunks.jsonl", file_hash: sha(astChunks) },
{ kind: "context-bom", path: "context-bom.json", file_hash: sha(contextBom) },
],
pack_hash: "deadbeef",
schema_version: 2,
});
await writeFile(join(dir, "manifest.json"), manifest, "utf8");
});
after(async () => {
await rm(dir, { recursive: true, force: true });
});

it("parses manifest (schema 2, no duckdb pin), ast-chunks, and context-bom ranges", async () => {
const loaded = await loadPack(dir);
assert.equal(loaded.manifest.packHash, "deadbeef");
assert.equal(loaded.manifest.budgetTokens, 100);
assert.equal(loaded.chunks.length, 2, "two ast-chunk rows parsed (blank line skipped)");
assert.equal(loaded.byteRangesByPath.get("a.ts")?.[0]?.end, 20);
assert.equal(loaded.integrityDrift.length, 0, "on-disk bytes match attested fileHashes");
});

it("flags integrity drift when a body's bytes don't match its attested hash", async () => {
// Rewrite the manifest with a wrong fileHash for ast-chunks.
const badManifest = JSON.stringify({
budget_tokens: 100,
commit: "c0ffee",
determinism_class: "strict",
files: [{ kind: "ast-chunks", path: "ast-chunks.jsonl", file_hash: "0".repeat(64) }],
pack_hash: "deadbeef",
schema_version: 2,
});
const badDir = await mkdtemp(join(tmpdir(), "och-replay-bad-"));
try {
await writeFile(
join(badDir, "ast-chunks.jsonl"),
'{"path":"a.ts","startByte":0,"endByte":1}',
"utf8",
);
await writeFile(join(badDir, "manifest.json"), badManifest, "utf8");
const loaded = await loadPack(badDir);
assert.deepEqual(loaded.integrityDrift, ["ast-chunks.jsonl"]);
} finally {
await rm(badDir, { recursive: true, force: true });
}
});

it("throws a clear error when the pack dir has no manifest", async () => {
await assert.rejects(() => loadPack(join(tmpdir(), "no-such-pack-dir")), /no pack at/);
});
});
Loading
Loading