Skip to content

feat(benchmark): add SkillSpector benchmark harness#5

Open
will-exaforce wants to merge 1 commit into
mainfrom
wbeasley/add-benchmark
Open

feat(benchmark): add SkillSpector benchmark harness#5
will-exaforce wants to merge 1 commit into
mainfrom
wbeasley/add-benchmark

Conversation

@will-exaforce

Copy link
Copy Markdown

Summary

  • Add benchmark/, a standalone uv project that runs SkillSpector over a labeled corpus and scores its classifications
  • Persist per-unit scan results to DuckDB and ship 14 SQL queries for metrics (recall, false positives/negatives, threshold sweeps, scan timing, per-category/vector breakdowns)
  • MalSkillBench dataset handler with parallel scan workers, per-worker auth handling, and per-unit timeouts

Details

Standalone project with an editable dependency on skillspector; root files stay at upstream parity. Worker timeouts use setitimer so sub-second --timeout values arm correctly, and --overwrite clears the sibling .wal to avoid replaying stale state into a fresh DB.

Standalone uv project that runs SkillSpector over a labeled corpus,
persists scan results to DuckDB, and ships SQL queries for metrics
(recall, false positives/negatives, threshold sweeps, timing).
@will-exaforce will-exaforce requested review from pupapaik and smoy June 19, 2026 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant