Add --storage_mode preset to optimize AF2/AF3 output footprint by DimaMolod · Pull Request #627 · KosinskiLab/AlphaPulldown

DimaMolod · 2026-06-15T14:03:06Z

Summary

Both folding backends write substantial redundant data with no way to prune it without breaking the vanilla AlphaFold layout:

AF2: predicted_aligned_error is stored twice — inside result_*.pkl (~21 MB float32 per model) and as the standalone pae_*.json sidecar.
AF3: the top-level *_confidences.json is a byte-identical copy of the best sample's confidences.json, *_data.json duplicates the saved features input, and every per-sample confidences.json is large and highly compressible.

This PR adds a single --storage_mode preset (both backends), defaulting to vanilla so existing output stays byte-identical to native AlphaFold2/3 and remains a drop-in for downstream tools:

Mode	AF2	AF3
`vanilla` (default)	unchanged	unchanged
`slim`	strip `predicted_aligned_error` from pickles (kept in `pae_*.json`) + xz-compress	drop top-level `confidences`/`data` duplicates + xz-compress non-best per-sample `confidences.json`
`minimal`	slim + drop all result pickles	slim + delete non-best per-sample `confidences.json`

The best sample's confidences.json is always left uncompressed so AlphaJudge (which reads best-model PAE from it and has no xz support) keeps working. All structures and summary scores are retained in every mode.

Why this is safe

Pickle contents are unused by the downstream consumers:

AlphaJudge reads no pickles — AF2 PAE from pae_*.json, AF3 from confidences.json / summary_confidences.json, structures from PDB/CIF.
convert_to_modelcif.py also doesn't read pickle contents — its pickle.load is commented out; scores come from confidence_*.json / ranking_debug.json / pae_*.json, and it only needs the pickle filename (derived from ranking_debug.json).

So slim/minimal lose nothing those consumers need.

Validation

Verified on a real plasmodium_hap2 prediction directory:

AF2: 254M → 148M (−42%)
AF3: 628M → 107M (−83%)
AlphaJudge produces byte-identical scores on slimmed vs. original output, in both best and all model modes, for both backends (only the source-path column differs).

Tests

8 new unit tests in test/unit/test_post_modelling.py covering AF2 and AF3 vanilla / slim / minimal, including the safety fallback that preserves the top-level confidences.json when the best sample lacks its own.
test/unit/test_script_entrypoints.py fixture updated for the new flag.
62/62 tests pass across the touched modules.

🤖 Generated with Claude Code

Both backends previously wrote substantial redundant data with no way to prune it without breaking the vanilla AlphaFold layout: AF2: predicted_aligned_error is stored both inside result_*.pkl (~21 MB float32 per model) and as the standalone pae_*.json sidecar. AF3: the top-level *_confidences.json is a byte-identical copy of the best sample's confidences.json, *_data.json duplicates the saved features input, and every per-sample confidences.json is large and compressible. Add a single --storage_mode preset (default 'vanilla', so existing output is byte-identical to native AlphaFold2/3 and remains a drop-in for downstream tools): vanilla - no change (default). slim - AF2: strip predicted_aligned_error from pickles (kept in pae_*.json) and xz-compress them; AF3: drop the top-level confidences/data duplicates and xz-compress non-best per-sample confidences.json. The best sample's confidences.json is left plain so AlphaJudge (no xz support) reads best-model PAE directly. minimal - slim plus: AF2 drops all result pickles; AF3 deletes non-best per-sample confidences.json. All structures and summary scores are retained. pkl contents are unused by AlphaJudge and by convert_to_modelcif (its pickle.load is commented out; scores come from the JSON sidecars), so slim/ minimal lose nothing those consumers need. Verified on a real plasmodium_hap2 prediction: slim shrinks AF2 254M->148M and AF3 628M->107M, and AlphaJudge produces byte-identical scores (best and all modes, both backends). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chatgpt-codex-connector · 2026-06-15T14:03:14Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Deleting a non-best AF3 sample's confidences.json is not disk-only data loss: it is the sole source of that sample's full token x token PAE matrix, and once gone AlphaJudge silently falls back to summary_confidences.json (coarse per-chain-pair PAE) for that sample, degrading its PAE-derived scores without any error. So AF3 'minimal' now behaves like 'slim' (non-best confidences are xz- compressed, not deleted); 'minimal' still differs from 'slim' on AF2, where it drops the genuinely-unused result pickles. Paired with the AlphaJudge change to read xz/gz confidences, slim/minimal are now lossless for both --models_to_analyse best and all. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

DimaMolod mentioned this pull request Jun 15, 2026

Read xz/gz-compressed AF3 confidences.json KosinskiLab/AlphaJudge#20

Merged

DimaMolod merged commit 751373c into main Jun 16, 2026
6 checks passed

DimaMolod deleted the storage-mode-optimization branch June 16, 2026 06:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add --storage_mode preset to optimize AF2/AF3 output footprint#627

Add --storage_mode preset to optimize AF2/AF3 output footprint#627
DimaMolod merged 2 commits into
mainfrom
storage-mode-optimization

DimaMolod commented Jun 15, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

DimaMolod commented Jun 15, 2026

Summary

Why this is safe

Validation

Tests

Uh oh!

chatgpt-codex-connector Bot commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant