Skip to content

feat(pack): context-bom read-receipt (9th BOM item) + real production provenance#261

Merged
theagenticguy merged 3 commits into
mainfrom
spec/009-context-bom-read-receipt
Jun 29, 2026
Merged

feat(pack): context-bom read-receipt (9th BOM item) + real production provenance#261
theagenticguy merged 3 commits into
mainfrom
spec/009-context-bom-read-receipt

Conversation

@theagenticguy

Copy link
Copy Markdown
Owner

Summary

Ships the Context BOM read-receipt — the 9th code-pack BOM item — and fixes a latent bug that left every production pack's provenance empty. Spec: .erpaval/specs/009-context-bom-read-receipt/spec.md. Roadmap origin: M-W-F run 2026-06-29, Move 1 (pursue-first) — sign what the agent read, the lane the most-starred rival (CBM) left open when it shipped SLSA/cosign over its binary this weekend.

Two commits:

feat(pack) — the read-receipt

  • New context-bom.json (CycloneDX 1.6): one file component per indexed File node, sorted by path, each with SHA-256 contentHash, line count, language. Byte ranges attach as a best-effort property.
  • Manifest gains contextBomHash and the item joins files[], so packHash covers it transitively — tamper with one read and the pack hash changes. Schema 1 → 2.
  • codehub code-pack --explain-context [--json] — read-only receipt summary (files, SHA-256 coverage, lines, per-language), read from the on-disk receipt without re-running the pack.

fix(cli) — real production provenance

runPackEngine passed generatePack none of its internal provenance inputs, so every real pack shipped commit:"", repo_origin_url:null, an empty ast-chunks.jsonl, and grammar_commits:{} (the test fixture injects chunkerFiles, which hid it). The receipt inherited the gap — byte ranges were always empty in production. Now derived in the CLI:

  • commit / repoOriginUrl — from the Repo graph node (pure read of indexed state, no git spawn at pack time).
  • chunkerFiles — File-node bytes read from disk, hash-verified against FileNode.contentHash; a drifted working-tree file is skipped, so the pack never chunks content that disagrees with what analyze saw.
  • grammarCommits — new ingestion export parse.grammarVersions() (vendored wasm manifest via the shared walk-up resolver).

Derivation is the unset-path fallback, so pack fixtures keep their behavior; the resolver is defensive against a stubbed/Repo-less graph.

Determinism

The contract holds: analyze + double-pack a real repo → context-bom.json, ast-chunks.jsonl, and manifest.json all byte-identical across two runs, with commit/origin/15 grammar pins/byte ranges now populated.

Tests / gates

  • New context-bom.test.ts (R2/R3/R4/R6/R7) + CLI --explain-context + provenance-derivation tests (incl. the drift-skip guard).
  • Full workspace test: 0 failures. Typecheck, banned-strings, licenses, sarif:validate all green locally.

Known follow-up (flagged, not in scope)

pins.chonkie_version still reports "unknown": chonkie loads and emits real strict chunks, but its version probe (createRequire("@chonkiejs/core/package.json")) returns undefined in the bundled CLI. Cosmetic label gap in ast-chunker, orthogonal to this wiring.

🤖 Generated with Claude Code

Emit context-bom.json, a CycloneDX 1.6 document recording the source files
the pack indexed: one `file` component per File node, sorted by path, each
with its SHA-256 contentHash, lineCount, and language. Byte ranges attach
as a best-effort property when the chunker supplied them.

Bind the receipt to the pack: the manifest gains `contextBomHash` (sha256
of the canonical context-bom.json) and the item joins `files[]`, so packHash
covers it transitively. Manifest schemaVersion bumps 1 -> 2.

Add `codehub code-pack --explain-context [--json]`, a read-only summary
(files indexed, SHA-256 coverage, total lines, per-language breakdown) read
from the on-disk receipt without re-running the pack.

The receipt anchors on File nodes, not AstChunk data: production never wires
chunkerFiles into generatePack, so ast-chunks (and thus byte ranges) are
empty in real packs today. File nodes are populated by analyze, so the
receipt is complete in production. The empty-chunker case is a flagged
latent bug, out of scope here.

Verified end-to-end: analyze + double-pack a real repo, context-bom.json is
byte-identical across runs; build, typecheck, full test suite (0 fail),
banned-strings, licenses, and sarif:validate all pass.
… pins)

runPackEngine called generatePack with only repoPath/outDir/budget/tokenizer
and none of the `internal` provenance inputs, so every production pack shipped
commit="", repo_origin_url=null, an empty ast-chunks.jsonl, grammar_commits={},
and chonkie_version="unknown". The determinism receipt was hollow in real packs
even though the determinism TEST fixture (which injects chunkerFiles) looked
complete. The spec 009 context-bom inherited the gap: its byte ranges were
always empty in production.

Derive the inputs in the CLI (the documented integration layer) and thread them
through generatePack's existing `internal` seam:

- commit / repoOriginUrl: read from the singleton Repo node (commitSha /
  originUrl) — a pure read of the indexed state, no git spawn at pack time.
- chunkerFiles: every File node's bytes, read from disk and hash-verified
  against FileNode.contentHash. A file whose working-tree bytes drifted from
  the index is skipped, so the pack never chunks content that disagrees with
  what was analyzed — byte-identity stays a function of the indexed commit.
- grammarCommits: the new ingestion `parse.grammarVersions()` export, which
  reads the vendored wasms manifest via the shared walk-up resolver.

Derivation is the unset-path fallback, so pack unit fixtures that inject these
keep their behavior; the resolver is defensive (a stubbed or Repo-less graph
yields safe empties, never a throw). New CLI test covers commit/origin
derivation, the drift-skip guard, and grammar-pin population.

Verified end-to-end: analyze + double-pack a real repo now records the HEAD
commit, origin, non-empty ast-chunks, 15 grammar pins, and context-bom byte
ranges — and the pack is still byte-identical across two runs. Full suite,
banned-strings, sarif:validate green.

Known follow-up: pins.chonkie_version still reports "unknown" (chonkie loads
and emits real strict chunks, but its version probe returns undefined in the
bundled CLI) — a cosmetic label gap in ast-chunker, orthogonal to this wiring.
… module

CI lint caught two things the local typecheck/test path missed (the mise lint
precheck SIGTERMs in-sandbox, so Biome never ran locally):

- noConsole: `console.log` is off in packages/cli/src/commands/** but warns in
  index.ts. Moved the --explain-context JSON emit into a `printContextSummary`
  helper in code-pack.ts (where stdout console.log is sanctioned), so index.ts
  only calls the helper.
- formatter: line-wrapping nits across the touched pack/cli/ingestion files,
  applied via `biome check --write`.

No behavior change. Build, full biome check, pack (110) + cli (346) tests green.
@theagenticguy theagenticguy merged commit b936af2 into main Jun 29, 2026
38 checks passed
@theagenticguy theagenticguy deleted the spec/009-context-bom-read-receipt branch June 29, 2026 15:23
@github-actions github-actions Bot mentioned this pull request Jun 29, 2026
theagenticguy added a commit that referenced this pull request Jun 29, 2026
🤖 Automated release via release-please
---


<details><summary>root: 0.10.3</summary>

##
[0.10.3](root-v0.10.2...root-v0.10.3)
(2026-06-29)


### Features

* **frameworks:** wire stage-3 config-AST evidence into detection
([#264](#264))
([18e08b2](18e08b2))
* **pack:** context-bom read-receipt (9th BOM item) + real production
provenance
([#261](#261))
([b936af2](b936af2))


### Bug Fixes

* wire four dropped injection seams (F1–F4 from the latent-bug sweep)
([#263](#263))
([dde590e](dde590e))
</details>

<details><summary>cli: 0.10.3</summary>

##
[0.10.3](cli-v0.10.2...cli-v0.10.3)
(2026-06-29)


### Features

* **pack:** context-bom read-receipt (9th BOM item) + real production
provenance
([#261](#261))
([b936af2](b936af2))


### Bug Fixes

* wire four dropped injection seams (F1–F4 from the latent-bug sweep)
([#263](#263))
([dde590e](dde590e))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Laith Al-Saadoon <9553966+theagenticguy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant