Read xz/gz-compressed AF3 confidences.json#20
Merged
Conversation
AlphaPulldown's new --storage_mode slim/minimal may store the large AF3 per-sample confidences.json compressed (xz). AlphaJudge previously could only read plain JSON, so for a compressed (or absent) confidences.json it silently fell back to summary_confidences.json, whose PAE is a coarse per-chain-pair minimum (full token x token matrix mean ~29 vs summary ~3.7 on a real complex) — degrading every PAE-derived score for that sample with no error. - _read_json now detects xz/gz by magic bytes (not extension) and, when the requested plain path is absent, transparently falls back to a .xz/.gz sibling. - _find_af3_json now also offers .xz/.gz variants of each candidate, and no longer lets summary_confidences.json satisfy a "confidences" search via the *_confidences.json glob (a latent shadowing bug independent of compression). Verified end to end: AlphaJudge scoring a slim-compressed AF3 run in --models_to_analyse all mode now reads each sample's full PAE matrix and produces byte-identical scores to the uncompressed (vanilla) run. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The summary-shadowing guard only skipped files literally starting with "summary_", but the official AF3 / AlphaPulldown best-model layout names the summary "<job>_summary_confidences.json", which does not start with "summary_". Match by suffix instead so both "summary_confidences.json" and "<job>_summary_confidences.json" (optionally .xz/.gz) are excluded from a "confidences" search and cannot shadow the real confidences file. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
AlphaPulldown's new
--storage_mode slim/minimal(KosinskiLab/AlphaPulldown#627) may store the large AF3 per-sampleconfidences.jsoncompressed (xz) to save disk. AlphaJudge could only read plain JSON, so for a compressed — or absent —confidences.jsonit silently fell back tosummary_confidences.json, whose PAE is only a coarse per-chain-pair minimum.On a real complex the two are not interchangeable: the full token×token matrix has mean ≈29 / max 31.7, while the summary-derived matrix has mean ≈3.7 / max 7.08. So every PAE-derived score for that sample (
average_interface_pae,interface_ipSAE,pDockQ2,LIS, …) was being computed from the wrong data, with no error or warning.Changes
BaseParser._read_jsondetects xz/gz by magic bytes (not file extension) and, when the requested plain path is absent, transparently falls back to a.xz/.gzsibling.AF3Parser._find_af3_jsonnow offers.xz/.gzvariants of each candidate, and no longer letssummary_confidences.jsonsatisfy a"confidences"search via the*_confidences.jsonglob — a latent shadowing bug that exists independently of compression (it was masked only while the plainconfidences.jsonhappened to be candidate[0]).Validation
End-to-end on a real
plasmodium_hap2AF3 run processed with--storage_mode slim(non-bestconfidences.json→.xz):--models_to_analyse allreadsummary_confidences.jsonfor non-best samples (PAE mean ≈3.5)..xzfile (max_pae 31.7), and the full score CSV is byte-identical to the uncompressed/vanilla run in bothbestandallmodes.Tests
6 new tests in
test/test_parsers_and_runner.py:_read_jsonreads plain /.xz/.gz, detects compression by magic bytes (xz under a.jsonname), falls back to a compressed sibling, and returns{}for a truly-missing file._find_af3_jsonfinds a compressedconfidences.json.xzand does not shadowconfidenceswithsummary_confidences.json.Full parser/runner suite passes (external-data tests skip as before).
🤖 Generated with Claude Code