Skip to content

docs: finding 0001 — OCH pack cuts agent token usage 2–4× (live Move 2 data)#273

Merged
theagenticguy merged 1 commit into
mainfrom
docs/pack-token-efficiency
Jun 30, 2026
Merged

docs: finding 0001 — OCH pack cuts agent token usage 2–4× (live Move 2 data)#273
theagenticguy merged 1 commit into
mainfrom
docs/pack-token-efficiency

Conversation

@theagenticguy

Copy link
Copy Markdown
Owner

What

The first live measurement from the Move 2 variance probe, written up as a findings doc. On real tasks, an OCH pack cut a coding agent's total token usage 2.18×–4.08× and cost 1.9×–3.3× — by replacing the agent's repo exploration with the pack's prebuilt structure map.

Task without pack with pack reduction
open-ended ("where to add a rule type") 658,318 tok / $0.64 161,285 tok / $0.20 4.08× / 3.26×
enumeration ("list exports") 623,098 tok / $0.70 286,379 tok / $0.36 2.18× / 1.91×

The saving is almost entirely cache tokens — the file-reads and tool-runs the agent does to reconstruct structure. Output tokens barely moved, so it's exploration-avoided, not shorter answers. This is the opposite of the arXiv "+10% tokens" framing: OCH's pack replaces exploration rather than adding to it.

Why this reframes the headline

Move 2 was specced to measure answer variance. On these tasks output_hash dispersion came back null — because it compares answer text, and a frontier model rephrases prose every run (saturated). Measuring decision convergence on prose needs the judge oracle, which the CLI doesn't yet wire. Token efficiency, by contrast, is a directly-measured resource number with no saturation, and it replicated across both task regimes. It's the more defensible claim.

Honest scope

Marked Preliminary: N=5, one small repo, one agent (Claude Sonnet 4.5), two tasks. A signal, not a benchmark. The doc says so plainly, with a reproduce recipe and next steps (wire the judge oracle; scale to more tasks/repos/Codex/larger N).

Docs-only — no code change.

🤖 Generated with Claude Code

…2 data)

First live measurement from the Move 2 variance probe on Bedrock. Two
tasks × N=5 × Claude Sonnet 4.5 against an isolated @opencodehub/policy
snapshot: the pack cut total token usage 2.18×–4.08× and cost 1.9×–3.3×,
driven almost entirely by cache tokens (exploration the agent skipped
because the pack handed it the structure). Output tokens barely moved, so
the saving is exploration-avoided, not shorter answers.

Reframes the headline from variance to token efficiency: output_hash
dispersion came back null because it compares answer TEXT and a frontier
model rephrases prose every run (saturated) — the variance question waits
on the judge oracle. Token efficiency is a directly-measured resource
number with no saturation, and it replicated across both task regimes.

Scoped honestly as preliminary (N=5, 1 repo, 1 agent, 2 tasks — a signal,
not a benchmark) with a reproduce recipe and next steps.
@theagenticguy theagenticguy merged commit c712ebc into main Jun 30, 2026
38 checks passed
@theagenticguy theagenticguy deleted the docs/pack-token-efficiency branch June 30, 2026 17:38
@github-actions github-actions Bot mentioned this pull request Jun 30, 2026
theagenticguy pushed a commit that referenced this pull request Jun 30, 2026
🤖 Automated release via release-please
---


<details><summary>root: 0.10.6</summary>

##
[0.10.6](root-v0.10.5...root-v0.10.6)
(2026-06-30)


### Bug Fixes

* **eval:** count cache tokens + per-harness model in variance probe
([#271](#271))
([df12cf9](df12cf9))


### Documentation

* finding 0001 — OCH pack cuts agent token usage 2–4× (live Move 2 data)
([#273](#273))
([c712ebc](c712ebc))
</details>

<details><summary>cli: 0.10.6</summary>

##
[0.10.6](cli-v0.10.5...cli-v0.10.6)
(2026-06-30)


### Bug Fixes

* **eval:** count cache tokens + per-harness model in variance probe
([#271](#271))
([df12cf9](df12cf9))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant