Type-check rfd3na input_parsing#354
Open
lyskov-ai wants to merge 20 commits into
Open
Conversation
Clear rfd3.trainer.fabric_trainer (99 errors) from the mypy ignore-errors ratchet. The errors were one dominant pattern: the dynamically-keyed `state` bag was inferred as `dict[Any, Any] | int | None`, so every state access and counter update errored. Mirror the already-landed foundry trainers/fabric precedent: a class-level `state: dict[str, Any]` + `_current_train_return: Any`, an annotated `default_state`, a documented type-ignore on the wider str|int precision API, a cast on setup_dataloaders + widening the loop params to DataLoader, `get_latest_checkpoint -> Path | None` with a cast at the call site, and the truthful `load_legacy_checkpoint -> None`. Behaviour-preserving, mypy-only (no clean CPU-test target for this cluster-coupled trainer glue). Ratchet 4 -> 3 modules remaining. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Clear the 15 mypy errors in rfd3.inference.legacy_input_parsing and remove it from the ignore_errors ratchet (3 -> 2 remaining: engine, utils.vizualize). - 11 implicit-Optional parameter fixes in create_atom_array_from_design_specification_legacy (incl. unfix_specific -> str | list | None, honest to the body's isinstance branches). - The Optional fixes cascade union-attr/arg-type errors at exists()-guarded sites; since foundry.common.exists is not a TypeGuard it does not narrow, so the narrowing-required guards are converted to behaviour-identical 'x is not None' checks (exists(x) is by definition 'x is not None'). - unfix_residues: list[str] annotation; rewrite the 'unfix_residues, _ = ...' unpack to '...[0]' and the side-effecting list-comp to a for loop, dropping the '_' that collided with the **_ kwargs parameter. - Documented cast(str, contig) (contig is always provided or defaulted to the non-optional length, which mypy cannot follow through exists()). - Fix a real reachable NameError: import the missing spoof_helical_bundle_ss_conditioning_fn from rfd3.utils.inference (its branch is config-gated and would crash today). mypy-only slice; no behaviour change on production paths. All gates green (ruff format/check, mypy 152 files, pytest 486 passed / 10 skipped). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Fix the pre-existing bugs that held the last two models/rfd3 modules on the
ignore_errors ratchet, then remove the rfd3 override block entirely. All of
models/rfd3 now type-checks with no module-level exemptions.
engine.py:
- run(): documented `# type: ignore[override]` + honest return type
`dict[str, list[RFD3Output]] | None` for the keyword-only override of the
positional base BaseInferenceEngine.run().
- _multiply_specifications: `exists(self.out_dir)` -> `self.out_dir is not None`
so mypy narrows it for find_files_with_extension().
- normalize_inputs: early returns + a documented cast for the str-split branch
(list is invariant).
- process_input: delete the dead/redundant block that re-did normalize_inputs'
work and called .split(',') on a value that is always a list; narrow the two
loop guards to `input is not None`.
io.py: widen find_files_with_extension's supported_file_types to `set | list`
(matches its siblings; the caller passes the CIF_LIKE_EXTENSIONS set).
vizualize.py (dev script, imported nowhere): drop the broken import+call of
_add_design_annotations_from_cif_block_metadata (removed in the open-sourcing
refactor, exists nowhere -> ImportError on the .cif path); keep plain-structure
.cif/.bcif loading; document the atomworks C_DIS .as_array() ignore.
Add unit tests for the pure normalize_inputs helper.
Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Add models/mpnn/src/mpnn to mypy's files and seed a fresh per-module ignore_errors ratchet for the 10 modules that currently have type errors, so the gate stays green while they are cleared one slice per task. The other 14 mpnn modules type-check clean under the lenient baseline and are now gated. Config-only; no pytest wiring (mpnn's existing test suite is partly cluster-coupled / library-drift-broken on CPU). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Add models/mpnn/tests to testpaths with a collect_ignore denylist in its conftest. Most of the existing mpnn suite loads structures via atomworks cached_parse (needs a DIGS PDB mirror absent in CI) and is held out, run locally on the cluster; the CPU-portable files are collected. Fix two stale numpy-bool identity assertions (np.True_ is True) in test_polymer_ligand_interface.py so it is green and collectable. Adds 28 mpnn CPU tests to the gate (520 passed, was 492). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Type-check mpnn.samplers.samplers, mpnn.trainers.mpnn, and mpnn.transforms.feature_aggregation.user_settings, and drop them from the ignore_errors ratchet. Annotation/comment-only: narrow the inherited BatchSampler.sampler to Sampler for set_sampler_epoch + a var annotation; a documented type: ignore[arg-type] for the DictConfig **-unpack (per the RF3/RFD3 Loss(**loss) precedent) and type: ignore[override] for the refined validation_step 3rd param (per rf3); and a type: ignore[misc] for an atomworks AnnotationList2D row unpack its stub doesn't express. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Type-check mpnn.transforms.feature_aggregation.token_encodings, mpnn.pipelines.mpnn, and mpnn.inference_engines.mpnn, and drop them from the ignore_errors ratchet. token_encodings wraps its TokenEncoding token_atoms values in np.asarray (matching the field's ndarray type, which __post_init__ already produces); pipelines.mpnn fixes an implicit-Optional model_type with a narrowing guard + list() for a tuple arg; the inference engine narrows _absolute_path_or_none's Optional result and the atom_arrays/input_dicts guard with documented asserts. Adds test_pipeline_validation.py (6 CPU tests) for the model_type validation. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
feature_collator: two implicit-Optional default_padding params + a list var annotation (behaviour-preserving). train: replace the ad-hoc local CkptConfig with foundry's CheckpointConfig (the type FabricTrainer.fit actually expects) — CkptConfig lacked parameter_freezing_config, which fit() reads, so the checkpoint-resume branch would have raised AttributeError; the swap fixes that latent footgun. Remaining train errors are third-party stub gaps (atomworks StructuralDatasetWrapper types dataset_parser as Callable but takes a parser object; torch WeightedRandomSampler accepts a Tensor despite its Sequence[float] stub) — documented type: ignore[arg-type]. train.py is a cluster-only training script (unverified in CI); the CheckpointConfig behaviour improvement is flagged for mpnn owners. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
load_legacy_weights reads model-specific int config dims (num_rbf, num_backbone_atoms, ...) off model.graph_featurization_module, but nn.Module.__getattr__ types submodule access as Tensor | Module, so those ints are invisible to the checker. Add a small typed helper _int_config_attr(module, name) -> int wrapping getattr and route the reads through it (behaviour-identical; the concrete featurization class varies, so casting to a single class wouldn't cover all reads). Also annotate the attribute-walk local as torch.nn.Module | None. mypy-only: the loader needs a real model + legacy checkpoint (cluster-only). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Clear the last exempted mpnn module (utils.inference, 15 errors) and remove the ignore_errors ratchet block entirely, so all of models/mpnn type-checks with no module-level exemptions. Fixes: narrow two guarded Optionals with asserts (config path in cli_to_json, base_path in write_fasta); type write_structure's file_type as Literal['cif','bcif','cif.gz'] (the type to_cif_file expects, matching the docstring; the sole caller uses the default); a categories dict annotation; and two documented type: ignore[arg-type] for atomworks stub gaps (**-unpacking a heterogeneous parser-args dict into parse(); passing ndarrays to set_annotation_2d which is typed Sequence). Adds test_inference_helpers.py (15 CPU tests) for the pure parsing helpers (str2bool, none_or_type, parse_json_like, parse_list_like, _absolute_path_or_none). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Add models/rfd3na/src/rfd3na to mypy's files and seed a fresh per-module ignore_errors ratchet for the 35 modules that currently have type errors, so the gate stays green while they are cleared one slice per task. The other 27 rfd3na modules type-check clean under the lenient baseline and are now gated. Config-only; no pytest wiring (rfd3na's existing tests are cluster-coupled like rfd3's). rfd3na is a near-copy of rfd3, so several modules carry the same defects rfd3 already fixed. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Type-check 8 rfd3na modules (one error each) and drop them from the ignore_errors ratchet. Mostly annotation-only: a LongTensor cast on .long(); three empty-dict var annotations; a numpy.typing NDArray for a pydantic bool-mask field; an assert narrowing a cache attr; and a type: ignore[list-item] for the Transform-class requires_previous_transforms pattern (mirroring rfd3). Also corrects a stale return annotation on hbonds_hbplus.calculate_hbonds (it was copied from the different calculate_hbonds in hbonds.py; the real return is (AtomArray, list, int), matching its callers). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Type-check 10 more rfd3na modules and drop them from the ignore_errors ratchet (nucleic_ss_metrics had only informational notes). Most mirror the already-landed rfd3 fixes: implicit-Optional params; a str|None base attr; an isinstance re-narrow after an untyped call; PathLike|None + transform assert; crop-center int|float decls + a cast; type: ignore for the Transform-class requires_previous_transforms and the DictConfig/None resize arg. Two genuine correctness fixes (both matching rfd3): correct permute_symmetric_atom_names_'s signature to np.ndarray (it does ndarray indexing and callers pass ndarrays); and drop vizualize's broken import of _add_design_annotations_from_cif_block_metadata (deleted in open-sourcing, exists nowhere), which would ImportError on the .cif-load path. Also add a ValueError guard for an unknown dna_crop protein_contact_type (rfd3 0058). rfd3na is cluster-coupled, so these are unverified in CI. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
…om the ratchet Type-check four more rfd3na modules, mirroring the already-landed rfd3 fixes: utils/io (Path wraps, cast(Tensor) on weighted_rigid_align, a list[Path] annotation, create_example_id_extractor's Callable return type, and a coords-var trajectory rewrite); trainer_utils (rebind np.array to a distinct atom_names_arr + a cast(dict) for the heterogeneous association_schemes constant); dump_validation_structures (epoch: str|None + type: ignore[override]); run_inference (cast OmegaConf.to_container to dict[str, Any]). All behaviour-preserving; mypy-only. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Type-check four more rfd3na modules: na_geom (implicit-Optional + partner_sym_map assert), model/RFD3 (DictConfig **-unpack ignores + coord_atom_lvl_to_be_noised Tensor|None with an inference-branch assert), design_transforms (type: ignore[list-item] for the Transform-class requires_previous_transforms), pipelines (list() wrap; fix the wrong residue_cache_dir: bool to str|Path|None; a required-in-practice association_scheme ignore; drop an orphaned import). Also correct 0072's util_transforms.encode_residues_to from int|None to str -- pipelines passes the GAP string token, matching rfd3's own correction. All behaviour-preserving; mypy-only. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
…e ratchet engine gets the full rfd3 treatment: a keyword-only run override ignore, a normalize_inputs rewrite with a documented cast, deletion of the dead .split-on-a-list block in process_input, and out_dir/input is-not-None narrows -- closing the pre-filed rfd3na-engine finding. This cascades two correct param widenings (io.find_files_with_extension to set|list; input_parsing.ensure_input_is_abspath to str|PathLike|None). inference_sampler: partial_t is a tensor (body calls .mean()), so type it Tensor|None and use a distinct float scalar downstream; assert the CFG-only ref_initializer_outputs; give gamma_0 a .get default so a missing value is a clean assert, not a None>float TypeError. trainer/rfd3na: cast(int) for the recycle count, type: ignore for Loss(**loss) and validation_step override, an assert the loss is set, and a dict annotation. All behaviour-preserving; mypy-only. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Type-check three more rfd3na modules, fixing real defects surfaced by honest typing: two NameErrors in utils.inference (logger -> ranked_logger; an undefined scale -> ori_jitter in the jitter block) and np.array used as a type annotation (-> np.ndarray). hbonds: implicit-Optional selection masks, Tuple[str] -> Tuple[str, ...] defaults, and a .sum()-on-None guard (None mask = no selection limit). na_geom_utils: cast the heterogeneous ATOM_REGION_BY_RESI constant, correct make_coord_list's signature (it returns coordinate floats and its caller passes list[int]/list[str|None]), and drop redundant re-annotations. The NameError fixes make previously-crashing cluster-only paths work; rfd3na is cluster-coupled, so unverified in CI. mypy-only otherwise. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
The association_schemes constant is a heterogeneous nested-dict structure built up across the module with .copy()/|=/indexed-assignment/.items(), so mypy inferred its values as object and every such operation failed. Annotate it dict[str, Any] -- one change clears all 14 errors and keeps the earlier cast(dict, association_schemes[scheme]) uses non-redundant. Data-only module; annotation-only. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
…om rfd3
Mirrors the rfd3 fix: 11 implicit-Optional params in
create_atom_array_from_design_specification_legacy with the cascaded
union-attr/arg-type errors narrowed via is-not-None + a cast(str, contig)
+ fixed_atoms or {}; a list-comp-for-side-effects rewritten as a loop;
unfix_residues annotation + [0] unpack; cast(list, backbone_atoms_*); and
the missing spoof_helical_bundle_ss_conditioning_fn import (was a
NameError).
Also ports _check_has_backbone_connections_to_nonstandard_residues from
rfd3's legacy_input_parsing (it was dropped in the rfd3na fork) so the
next slice can clear input_parsing, which imports it.
rfd3na is cluster-coupled; mypy-only otherwise.
Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Now unblocked by the 0078 port of _check_has_backbone_connections_to_ nonstandard_residues. Mirrors the rfd3 fix: add the four missing imports (create_o_atoms/create_cb_atoms, fetch_mask_from_idx, and the ported _check_has_backbone helper -- all NameErrors); fix create_motif_residue's error message to use the token's own chain_id/res_id (was undefined src_chain/src_resid); narrow accumulate_components' components to List[str] (the int was spurious) and unindexed_breaks to List[bool|None]; rewrite a list-comp-for-side-effects as a set loop; data: dict|None; and a components->component typo. rfd3na is cluster-coupled; the NameError fixes make reachable paths work but are unverified in CI. mypy-only otherwise. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Continues clearing the
models/rfd3namypy ratchet —input_parsing, leaving justfabric_trainer. Mirrors the fix already landed for the rfd3 module, and depends on the previous PR (which restored_check_has_backbone_connections_to_nonstandard_residues).NameErrors):create_o_atoms/create_cb_atomsfromrfd3na.utils.inference,fetch_mask_from_idxfromfoundry.utils.components, and_check_has_backbone_connections_to_nonstandard_residuesfromrfd3na.inference.legacy_input_parsing.create_motif_residue's error message to reference the token's ownchain_id/res_id(the oldsrc_chain/src_residwere undefined).accumulate_components: narrowcomponents_to_accumulatetoList[str](theintwas spurious — both data sources arelist[str]) andunindexed_breakstoList[bool | None].setloop;data: dict | None; fix acomponents→componenttypo.rfd3nais cluster-coupled; theNameErrorfixes make reachable paths work but aren't exercised in CI.Verification:
ruffformat/check clean;mypySuccess (238 files);pytest541 passed / 10 skipped.