feat(pack): context-bom read-receipt (9th BOM item) + real production provenance#261
Merged
Merged
Conversation
Emit context-bom.json, a CycloneDX 1.6 document recording the source files the pack indexed: one `file` component per File node, sorted by path, each with its SHA-256 contentHash, lineCount, and language. Byte ranges attach as a best-effort property when the chunker supplied them. Bind the receipt to the pack: the manifest gains `contextBomHash` (sha256 of the canonical context-bom.json) and the item joins `files[]`, so packHash covers it transitively. Manifest schemaVersion bumps 1 -> 2. Add `codehub code-pack --explain-context [--json]`, a read-only summary (files indexed, SHA-256 coverage, total lines, per-language breakdown) read from the on-disk receipt without re-running the pack. The receipt anchors on File nodes, not AstChunk data: production never wires chunkerFiles into generatePack, so ast-chunks (and thus byte ranges) are empty in real packs today. File nodes are populated by analyze, so the receipt is complete in production. The empty-chunker case is a flagged latent bug, out of scope here. Verified end-to-end: analyze + double-pack a real repo, context-bom.json is byte-identical across runs; build, typecheck, full test suite (0 fail), banned-strings, licenses, and sarif:validate all pass.
… pins)
runPackEngine called generatePack with only repoPath/outDir/budget/tokenizer
and none of the `internal` provenance inputs, so every production pack shipped
commit="", repo_origin_url=null, an empty ast-chunks.jsonl, grammar_commits={},
and chonkie_version="unknown". The determinism receipt was hollow in real packs
even though the determinism TEST fixture (which injects chunkerFiles) looked
complete. The spec 009 context-bom inherited the gap: its byte ranges were
always empty in production.
Derive the inputs in the CLI (the documented integration layer) and thread them
through generatePack's existing `internal` seam:
- commit / repoOriginUrl: read from the singleton Repo node (commitSha /
originUrl) — a pure read of the indexed state, no git spawn at pack time.
- chunkerFiles: every File node's bytes, read from disk and hash-verified
against FileNode.contentHash. A file whose working-tree bytes drifted from
the index is skipped, so the pack never chunks content that disagrees with
what was analyzed — byte-identity stays a function of the indexed commit.
- grammarCommits: the new ingestion `parse.grammarVersions()` export, which
reads the vendored wasms manifest via the shared walk-up resolver.
Derivation is the unset-path fallback, so pack unit fixtures that inject these
keep their behavior; the resolver is defensive (a stubbed or Repo-less graph
yields safe empties, never a throw). New CLI test covers commit/origin
derivation, the drift-skip guard, and grammar-pin population.
Verified end-to-end: analyze + double-pack a real repo now records the HEAD
commit, origin, non-empty ast-chunks, 15 grammar pins, and context-bom byte
ranges — and the pack is still byte-identical across two runs. Full suite,
banned-strings, sarif:validate green.
Known follow-up: pins.chonkie_version still reports "unknown" (chonkie loads
and emits real strict chunks, but its version probe returns undefined in the
bundled CLI) — a cosmetic label gap in ast-chunker, orthogonal to this wiring.
… module CI lint caught two things the local typecheck/test path missed (the mise lint precheck SIGTERMs in-sandbox, so Biome never ran locally): - noConsole: `console.log` is off in packages/cli/src/commands/** but warns in index.ts. Moved the --explain-context JSON emit into a `printContextSummary` helper in code-pack.ts (where stdout console.log is sanctioned), so index.ts only calls the helper. - formatter: line-wrapping nits across the touched pack/cli/ingestion files, applied via `biome check --write`. No behavior change. Build, full biome check, pack (110) + cli (346) tests green.
Merged
theagenticguy
added a commit
that referenced
this pull request
Jun 29, 2026
🤖 Automated release via release-please --- <details><summary>root: 0.10.3</summary> ## [0.10.3](root-v0.10.2...root-v0.10.3) (2026-06-29) ### Features * **frameworks:** wire stage-3 config-AST evidence into detection ([#264](#264)) ([18e08b2](18e08b2)) * **pack:** context-bom read-receipt (9th BOM item) + real production provenance ([#261](#261)) ([b936af2](b936af2)) ### Bug Fixes * wire four dropped injection seams (F1–F4 from the latent-bug sweep) ([#263](#263)) ([dde590e](dde590e)) </details> <details><summary>cli: 0.10.3</summary> ## [0.10.3](cli-v0.10.2...cli-v0.10.3) (2026-06-29) ### Features * **pack:** context-bom read-receipt (9th BOM item) + real production provenance ([#261](#261)) ([b936af2](b936af2)) ### Bug Fixes * wire four dropped injection seams (F1–F4 from the latent-bug sweep) ([#263](#263)) ([dde590e](dde590e)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Laith Al-Saadoon <9553966+theagenticguy@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ships the Context BOM read-receipt — the 9th code-pack BOM item — and fixes a latent bug that left every production pack's provenance empty. Spec:
.erpaval/specs/009-context-bom-read-receipt/spec.md. Roadmap origin: M-W-F run 2026-06-29, Move 1 (pursue-first) — sign what the agent read, the lane the most-starred rival (CBM) left open when it shipped SLSA/cosign over its binary this weekend.Two commits:
feat(pack)— the read-receiptcontext-bom.json(CycloneDX 1.6): onefilecomponent per indexedFilenode, sorted by path, each with SHA-256contentHash, line count, language. Byte ranges attach as a best-effort property.contextBomHashand the item joinsfiles[], sopackHashcovers it transitively — tamper with one read and the pack hash changes. Schema1 → 2.codehub code-pack --explain-context [--json]— read-only receipt summary (files, SHA-256 coverage, lines, per-language), read from the on-disk receipt without re-running the pack.fix(cli)— real production provenancerunPackEnginepassedgeneratePacknone of itsinternalprovenance inputs, so every real pack shippedcommit:"",repo_origin_url:null, an emptyast-chunks.jsonl, andgrammar_commits:{}(the test fixture injectschunkerFiles, which hid it). The receipt inherited the gap — byte ranges were always empty in production. Now derived in the CLI:Repograph node (pure read of indexed state, nogitspawn at pack time).FileNode.contentHash; a drifted working-tree file is skipped, so the pack never chunks content that disagrees with whatanalyzesaw.ingestionexportparse.grammarVersions()(vendored wasm manifest via the shared walk-up resolver).Derivation is the unset-path fallback, so pack fixtures keep their behavior; the resolver is defensive against a stubbed/Repo-less graph.
Determinism
The contract holds:
analyze+ double-pack a real repo →context-bom.json,ast-chunks.jsonl, andmanifest.jsonall byte-identical across two runs, with commit/origin/15 grammar pins/byte ranges now populated.Tests / gates
context-bom.test.ts(R2/R3/R4/R6/R7) + CLI--explain-context+ provenance-derivation tests (incl. the drift-skip guard).Known follow-up (flagged, not in scope)
pins.chonkie_versionstill reports"unknown": chonkie loads and emits real strict chunks, but its version probe (createRequire("@chonkiejs/core/package.json")) returns undefined in the bundled CLI. Cosmetic label gap inast-chunker, orthogonal to this wiring.🤖 Generated with Claude Code