docs: finding 0001 — OCH pack cuts agent token usage 2–4× (live Move 2 data) by theagenticguy · Pull Request #273 · theagenticguy/opencodehub

theagenticguy · 2026-06-30T17:32:45Z

What

The first live measurement from the Move 2 variance probe, written up as a findings doc. On real tasks, an OCH pack cut a coding agent's total token usage 2.18×–4.08× and cost 1.9×–3.3× — by replacing the agent's repo exploration with the pack's prebuilt structure map.

Task	without pack	with pack	reduction
open-ended ("where to add a rule type")	658,318 tok / $0.64	161,285 tok / $0.20	4.08× / 3.26×
enumeration ("list exports")	623,098 tok / $0.70	286,379 tok / $0.36	2.18× / 1.91×

The saving is almost entirely cache tokens — the file-reads and tool-runs the agent does to reconstruct structure. Output tokens barely moved, so it's exploration-avoided, not shorter answers. This is the opposite of the arXiv "+10% tokens" framing: OCH's pack replaces exploration rather than adding to it.

Why this reframes the headline

Move 2 was specced to measure answer variance. On these tasks output_hash dispersion came back null — because it compares answer text, and a frontier model rephrases prose every run (saturated). Measuring decision convergence on prose needs the judge oracle, which the CLI doesn't yet wire. Token efficiency, by contrast, is a directly-measured resource number with no saturation, and it replicated across both task regimes. It's the more defensible claim.

Honest scope

Marked Preliminary: N=5, one small repo, one agent (Claude Sonnet 4.5), two tasks. A signal, not a benchmark. The doc says so plainly, with a reproduce recipe and next steps (wire the judge oracle; scale to more tasks/repos/Codex/larger N).

Docs-only — no code change.

🤖 Generated with Claude Code

…2 data) First live measurement from the Move 2 variance probe on Bedrock. Two tasks × N=5 × Claude Sonnet 4.5 against an isolated @opencodehub/policy snapshot: the pack cut total token usage 2.18×–4.08× and cost 1.9×–3.3×, driven almost entirely by cache tokens (exploration the agent skipped because the pack handed it the structure). Output tokens barely moved, so the saving is exploration-avoided, not shorter answers. Reframes the headline from variance to token efficiency: output_hash dispersion came back null because it compares answer TEXT and a frontier model rephrases prose every run (saturated) — the variance question waits on the judge oracle. Token efficiency is a directly-measured resource number with no saturation, and it replicated across both task regimes. Scoped honestly as preliminary (N=5, 1 repo, 1 agent, 2 tasks — a signal, not a benchmark) with a reproduce recipe and next steps.

🤖 Automated release via release-please --- <details><summary>root: 0.10.6</summary> ## [0.10.6](root-v0.10.5...root-v0.10.6) (2026-06-30) ### Bug Fixes * **eval:** count cache tokens + per-harness model in variance probe ([#271](#271)) ([df12cf9](df12cf9)) ### Documentation * finding 0001 — OCH pack cuts agent token usage 2–4× (live Move 2 data) ([#273](#273)) ([c712ebc](c712ebc)) </details> <details><summary>cli: 0.10.6</summary> ## [0.10.6](cli-v0.10.5...cli-v0.10.6) (2026-06-30) ### Bug Fixes * **eval:** count cache tokens + per-harness model in variance probe ([#271](#271)) ([df12cf9](df12cf9)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

theagenticguy merged commit c712ebc into main Jun 30, 2026
38 checks passed

theagenticguy deleted the docs/pack-token-efficiency branch June 30, 2026 17:38

github-actions Bot mentioned this pull request Jun 30, 2026

chore: release main #272

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: finding 0001 — OCH pack cuts agent token usage 2–4× (live Move 2 data)#273

docs: finding 0001 — OCH pack cuts agent token usage 2–4× (live Move 2 data)#273
theagenticguy merged 1 commit into
mainfrom
docs/pack-token-efficiency

theagenticguy commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

theagenticguy commented Jun 30, 2026

What

Why this reframes the headline

Honest scope

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant