Skip to content

feat(campaign): GEPA-distillation harness — distill a cheap analyst toward gold verdicts#141

Merged
drewstone merged 1 commit into
mainfrom
feat/skill-distillation-loop
May 30, 2026
Merged

feat(campaign): GEPA-distillation harness — distill a cheap analyst toward gold verdicts#141
drewstone merged 1 commit into
mainfrom
feat/skill-distillation-loop

Conversation

@drewstone
Copy link
Copy Markdown
Contributor

Teacher→student distillation, composed from existing primitives

The expensive workflow (the 70-agent skill-audit) is the teacher: its frozen verdicts are gold. A cheap single-shot LLM analyst is the student. gepaDriver optimizes the student's prompt toward agreement with the gold labels. This is the distillation loop — collapse an expensive multi-agent verdict into one cheap prompt that reproduces it.

It composes the shipped improvement-loop primitives and reimplements none of them:

  • Loop: runImprovementLoop (outer: optimize → holdout re-score → gate → optional PR)
  • Driver: gepaDriver (reflective prompt optimizer, Pareto-combine of complementary lessons)
  • Measurement: runCampaign inside the loop, scoring the student over the train split
  • Gate: caller-supplied — heldOutGate by default, or defaultProductionGate for the full red-team / reward-hacking / canary stack

autoOnPromote: 'none' is forced inside runDistillation — the loop never opens a PR; the caller (the distill CLI) decides what to do with the winning prompt.

New module: src/campaign/distillation/ (generic + parameterized)

Distills any analyst against any gold JSONL of {input, label} — no skill data or domain logic is hardcoded in the module.

  • gold-scenarios.tsGoldScenario (opaque input+label), loadGoldScenarios/parseGoldJsonl, deterministic splitGold (honors an explicit split field, else a modulo stride). Sanitizes : in ids (the real gold uses skill:foo ids) into __ so the campaign cellId (${id}:${rep}) that heldOutGate parses via cellId.split(':')[0] keeps recovering the right scenario — a colon in the id would otherwise collapse every scenario into one bucket and zero the holdout delta. The original id is preserved on a gold-id: tag.
  • agreement-judge.tsbuildAgreementJudge returns a real substrate JudgeConfig whose score({artifact, scenario, signal}) scores the student's produced label against the scenario's gold label via an injected comparator (domain-agnostic). Default fieldAgreement: exact-match on categorical fields + Jaccard set-overlap on array fields, averaged. Pure + unit-tested.
  • run-distillation.tsrunDistillation wires baselineSurface = student prompt, dispatchWithSurface = render the surface + scenario input → call the LLM via createChatClient → parse the produced JSON label → return as the artifact (reports cost + tokens via ctx.cost, so the cell is never a stub), driver = gepaDriver, judge = the agreement judge, scenarios = train split, holdoutScenarios = test split, gate = heldOutGate/defaultProductionGate. Returns the winning prompt + before/after agreement on the holdout.
  • cli.tsdistill --gold <path> --baseline <prompt-file> --model <m> [--optimizer-model m] [--generations N] [--population K] [--reps R] [--categorical ...] [--array ...]. The live, token-spending entry — invoked by hand, never in CI. Auth via LLM_API_KEY/TANGLE_API_KEY.

Public surface exported from src/index.ts: loadGoldScenarios, parseGoldJsonl, splitGold, buildAgreementJudge, fieldAgreement, runDistillation + their types.

Tests — exact agreement scores + shapes (no toBeTruthy)

src/campaign/distillation/distillation.test.ts:

  • fieldAgreement: exact match = 1.0; total mismatch = 0; partial array overlap = exact Jaccard (0.5).
  • Leak-detection regression — a produced verdict matching gold on value_verdict but missing a public_leak_risk=true scores 2/3, strictly less than the full match 1.0. Without this the GEPA loop would have no gradient to teach leak detection.
  • loadGoldScenarios/splitGold on inline fixtures: correct parse, deterministic train/test split, honors the explicit split field, and sanitizes colon ids (the real skill:foo regression).
  • buildAgreementJudge: .score(...) returns the exact comparator agreement for a produced-vs-gold pair, applies only to gold-kind scenarios, throws on an out-of-range comparator score.
  • DISTILL_LIVE-gated wiring test: runs runDistillation (population 2, 1 generation, 1 rep) on a 3-scenario inline gold set using a mock chat transport for the student and a stubbed fetch for the GEPA reflection — exercises the full loop end-to-end without real tokens, and pins the holdout agreement (2/3) through the campaign + asserts a winner surface + gate decision come back.

Intended live input

skills-internal/audits/gold/skill-verdicts.gold.jsonl (53 records) — referenced only; not copied into this repo. Verified locally that the harness loads all 53, splits them 41 train / 12 holdout (honoring the gold's own split tags), scores a self-match 1.0, and drops to 6/7 when the leak flag is flipped.

Run a real distillation

TANGLE_API_KEY=$(cat /tmp/.tk) pnpm tsx src/campaign/distillation/cli.ts \
  --gold ~/code/skills-internal/audits/gold/skill-verdicts.gold.jsonl \
  --baseline ./baseline-skill-analyst.txt \
  --model gpt-4o-mini --optimizer-model gpt-4o \
  --generations 3 --population 4 \
  --categorical value_verdict,quality_score,generalization_rating,public_leak_risk,write_target_rating,subagent_recommended \
  --array top_actions

Verification

  • pnpm typecheck — clean
  • pnpm vitest run src/campaign/distillation/distillation.test.ts — 15 passed / 1 skipped (DISTILL_LIVE off); 16 passed (DISTILL_LIVE on)
  • biome check on the module + src/index.ts — clean

…oward gold verdicts

Compose the existing improvement-loop primitives into a teacher→student
distillation loop: an expensive workflow's frozen gold verdicts are the
teacher, a cheap single-shot LLM analyst is the student, and gepaDriver
optimizes the student's prompt toward agreement with the gold labels.

New module src/campaign/distillation/ (generic + parameterized — distills any
analyst against any gold JSONL of {input, label}):

- gold-scenarios.ts: GoldScenario (opaque input+label), loadGoldScenarios /
  parseGoldJsonl, deterministic splitGold (honors explicit split tags, else
  modulo). Sanitizes ':' in ids (real skill:foo gold) so the campaign cellId
  split that heldOutGate relies on stays correct.
- agreement-judge.ts: buildAgreementJudge — a real JudgeConfig scoring the
  student's produced label against the scenario's gold label via an injected
  comparator. Default fieldAgreement: exact-match on categorical fields +
  Jaccard on array fields, averaged. Pure.
- run-distillation.ts: runDistillation wires baselineSurface=student prompt,
  dispatchWithSurface=render+createChatClient call+parse JSON label,
  driver=gepaDriver, judge=agreement judge, train split as scenarios, test
  split as holdoutScenarios, gate=heldOutGate (default) or defaultProductionGate,
  autoOnPromote='none' (never opens a PR from inside the loop). Composes
  runImprovementLoop — reimplements none of it.
- cli.ts: distill --gold --baseline --model [--generations --population ...].
  The live token-spending entry; invoked by hand, not in CI.

Tests assert exact agreement scores + shapes. The leak-detection regression
proves a verdict matching value_verdict but missing public_leak_risk=true
scores strictly lower than a full match (2/3 vs 1.0). A DISTILL_LIVE-gated
wiring test runs the full loop on a mock chat transport + stubbed reflection
fetch — no real tokens — and pins the holdout agreement math through the
campaign.

Intended live input: skills-internal/audits/gold/skill-verdicts.gold.jsonl
(53 records). Verified the harness loads + splits it 41/12 and the agreement
judge scores a self-match 1.0 and a flipped leak flag 6/7.
Copy link
Copy Markdown
Contributor

@tangletools tangletools left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved (verified locally). Composes runImprovementLoop+gepaDriver+heldOutGate (zero reimplementation, autoOnPromote:none forced). Generic/parameterized over any gold JSONL. typecheck clean; 15 pass/1 skip (16 with DISTILL_LIVE, full loop on mock transport — no real tokens). Leak-detection regression asserts real agreement gradient (fullMatch=1, missesLeak=2/3, leak dim=0). Caught + regression-tested a real scenarioId-colon bug that would have zeroed the holdout delta.

@drewstone drewstone merged commit 1dab44c into main May 30, 2026
1 check passed
@drewstone drewstone deleted the feat/skill-distillation-loop branch May 30, 2026 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants