Skip to content

Add --storage_mode preset to optimize AF2/AF3 output footprint#627

Merged
DimaMolod merged 2 commits into
mainfrom
storage-mode-optimization
Jun 16, 2026
Merged

Add --storage_mode preset to optimize AF2/AF3 output footprint#627
DimaMolod merged 2 commits into
mainfrom
storage-mode-optimization

Conversation

@DimaMolod

Copy link
Copy Markdown
Collaborator

Summary

Both folding backends write substantial redundant data with no way to prune it without breaking the vanilla AlphaFold layout:

  • AF2: predicted_aligned_error is stored twice — inside result_*.pkl (~21 MB float32 per model) and as the standalone pae_*.json sidecar.
  • AF3: the top-level *_confidences.json is a byte-identical copy of the best sample's confidences.json, *_data.json duplicates the saved features input, and every per-sample confidences.json is large and highly compressible.

This PR adds a single --storage_mode preset (both backends), defaulting to vanilla so existing output stays byte-identical to native AlphaFold2/3 and remains a drop-in for downstream tools:

Mode AF2 AF3
vanilla (default) unchanged unchanged
slim strip predicted_aligned_error from pickles (kept in pae_*.json) + xz-compress drop top-level confidences/data duplicates + xz-compress non-best per-sample confidences.json
minimal slim + drop all result pickles slim + delete non-best per-sample confidences.json

The best sample's confidences.json is always left uncompressed so AlphaJudge (which reads best-model PAE from it and has no xz support) keeps working. All structures and summary scores are retained in every mode.

Why this is safe

Pickle contents are unused by the downstream consumers:

  • AlphaJudge reads no pickles — AF2 PAE from pae_*.json, AF3 from confidences.json / summary_confidences.json, structures from PDB/CIF.
  • convert_to_modelcif.py also doesn't read pickle contents — its pickle.load is commented out; scores come from confidence_*.json / ranking_debug.json / pae_*.json, and it only needs the pickle filename (derived from ranking_debug.json).

So slim/minimal lose nothing those consumers need.

Validation

Verified on a real plasmodium_hap2 prediction directory:

  • AF2: 254M → 148M (−42%)
  • AF3: 628M → 107M (−83%)
  • AlphaJudge produces byte-identical scores on slimmed vs. original output, in both best and all model modes, for both backends (only the source-path column differs).

Tests

  • 8 new unit tests in test/unit/test_post_modelling.py covering AF2 and AF3 vanilla / slim / minimal, including the safety fallback that preserves the top-level confidences.json when the best sample lacks its own.
  • test/unit/test_script_entrypoints.py fixture updated for the new flag.
  • 62/62 tests pass across the touched modules.

🤖 Generated with Claude Code

Both backends previously wrote substantial redundant data with no way to
prune it without breaking the vanilla AlphaFold layout:

  AF2: predicted_aligned_error is stored both inside result_*.pkl (~21 MB
       float32 per model) and as the standalone pae_*.json sidecar.
  AF3: the top-level *_confidences.json is a byte-identical copy of the best
       sample's confidences.json, *_data.json duplicates the saved features
       input, and every per-sample confidences.json is large and compressible.

Add a single --storage_mode preset (default 'vanilla', so existing output is
byte-identical to native AlphaFold2/3 and remains a drop-in for downstream
tools):

  vanilla  - no change (default).
  slim     - AF2: strip predicted_aligned_error from pickles (kept in
             pae_*.json) and xz-compress them; AF3: drop the top-level
             confidences/data duplicates and xz-compress non-best per-sample
             confidences.json. The best sample's confidences.json is left
             plain so AlphaJudge (no xz support) reads best-model PAE directly.
  minimal  - slim plus: AF2 drops all result pickles; AF3 deletes non-best
             per-sample confidences.json. All structures and summary scores
             are retained.

pkl contents are unused by AlphaJudge and by convert_to_modelcif (its
pickle.load is commented out; scores come from the JSON sidecars), so slim/
minimal lose nothing those consumers need. Verified on a real plasmodium_hap2
prediction: slim shrinks AF2 254M->148M and AF3 628M->107M, and AlphaJudge
produces byte-identical scores (best and all modes, both backends).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Deleting a non-best AF3 sample's confidences.json is not disk-only data loss:
it is the sole source of that sample's full token x token PAE matrix, and
once gone AlphaJudge silently falls back to summary_confidences.json (coarse
per-chain-pair PAE) for that sample, degrading its PAE-derived scores without
any error.

So AF3 'minimal' now behaves like 'slim' (non-best confidences are xz-
compressed, not deleted); 'minimal' still differs from 'slim' on AF2, where it
drops the genuinely-unused result pickles. Paired with the AlphaJudge change to
read xz/gz confidences, slim/minimal are now lossless for both --models_to_analyse
best and all.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@DimaMolod DimaMolod merged commit 751373c into main Jun 16, 2026
6 checks passed
@DimaMolod DimaMolod deleted the storage-mode-optimization branch June 16, 2026 06:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant