Type-check rfd3na input_parsing by lyskov-ai · Pull Request #354 · RosettaCommons/foundry

lyskov-ai · 2026-07-01T03:06:35Z

Continues clearing the models/rfd3na mypy ratchet — input_parsing, leaving just fabric_trainer. Mirrors the fix already landed for the rfd3 module, and depends on the previous PR (which restored _check_has_backbone_connections_to_nonstandard_residues).

Adds four missing imports (all NameErrors): create_o_atoms/create_cb_atoms from rfd3na.utils.inference, fetch_mask_from_idx from foundry.utils.components, and _check_has_backbone_connections_to_nonstandard_residues from rfd3na.inference.legacy_input_parsing.
Fixes create_motif_residue's error message to reference the token's own chain_id/res_id (the old src_chain/src_resid were undefined).
accumulate_components: narrow components_to_accumulate to List[str] (the int was spurious — both data sources are list[str]) and unindexed_breaks to List[bool | None].
Rewrite a list-comprehension-for-side-effects as a set loop; data: dict | None; fix a components → component typo.

rfd3na is cluster-coupled; the NameError fixes make reachable paths work but aren't exercised in CI.

Verification: ruff format/check clean; mypy Success (238 files); pytest 541 passed / 10 skipped.

Clear rfd3.trainer.fabric_trainer (99 errors) from the mypy ignore-errors ratchet. The errors were one dominant pattern: the dynamically-keyed `state` bag was inferred as `dict[Any, Any] | int | None`, so every state access and counter update errored. Mirror the already-landed foundry trainers/fabric precedent: a class-level `state: dict[str, Any]` + `_current_train_return: Any`, an annotated `default_state`, a documented type-ignore on the wider str|int precision API, a cast on setup_dataloaders + widening the loop params to DataLoader, `get_latest_checkpoint -> Path | None` with a cast at the call site, and the truthful `load_legacy_checkpoint -> None`. Behaviour-preserving, mypy-only (no clean CPU-test target for this cluster-coupled trainer glue). Ratchet 4 -> 3 modules remaining. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Clear the 15 mypy errors in rfd3.inference.legacy_input_parsing and remove it from the ignore_errors ratchet (3 -> 2 remaining: engine, utils.vizualize). - 11 implicit-Optional parameter fixes in create_atom_array_from_design_specification_legacy (incl. unfix_specific -> str | list | None, honest to the body's isinstance branches). - The Optional fixes cascade union-attr/arg-type errors at exists()-guarded sites; since foundry.common.exists is not a TypeGuard it does not narrow, so the narrowing-required guards are converted to behaviour-identical 'x is not None' checks (exists(x) is by definition 'x is not None'). - unfix_residues: list[str] annotation; rewrite the 'unfix_residues, _ = ...' unpack to '...[0]' and the side-effecting list-comp to a for loop, dropping the '_' that collided with the **_ kwargs parameter. - Documented cast(str, contig) (contig is always provided or defaulted to the non-optional length, which mypy cannot follow through exists()). - Fix a real reachable NameError: import the missing spoof_helical_bundle_ss_conditioning_fn from rfd3.utils.inference (its branch is config-gated and would crash today). mypy-only slice; no behaviour change on production paths. All gates green (ruff format/check, mypy 152 files, pytest 486 passed / 10 skipped). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Fix the pre-existing bugs that held the last two models/rfd3 modules on the ignore_errors ratchet, then remove the rfd3 override block entirely. All of models/rfd3 now type-checks with no module-level exemptions. engine.py: - run(): documented `# type: ignore[override]` + honest return type `dict[str, list[RFD3Output]] | None` for the keyword-only override of the positional base BaseInferenceEngine.run(). - _multiply_specifications: `exists(self.out_dir)` -> `self.out_dir is not None` so mypy narrows it for find_files_with_extension(). - normalize_inputs: early returns + a documented cast for the str-split branch (list is invariant). - process_input: delete the dead/redundant block that re-did normalize_inputs' work and called .split(',') on a value that is always a list; narrow the two loop guards to `input is not None`. io.py: widen find_files_with_extension's supported_file_types to `set | list` (matches its siblings; the caller passes the CIF_LIKE_EXTENSIONS set). vizualize.py (dev script, imported nowhere): drop the broken import+call of _add_design_annotations_from_cif_block_metadata (removed in the open-sourcing refactor, exists nowhere -> ImportError on the .cif path); keep plain-structure .cif/.bcif loading; document the atomworks C_DIS .as_array() ignore. Add unit tests for the pure normalize_inputs helper. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Add models/mpnn/src/mpnn to mypy's files and seed a fresh per-module ignore_errors ratchet for the 10 modules that currently have type errors, so the gate stays green while they are cleared one slice per task. The other 14 mpnn modules type-check clean under the lenient baseline and are now gated. Config-only; no pytest wiring (mpnn's existing test suite is partly cluster-coupled / library-drift-broken on CPU). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Add models/mpnn/tests to testpaths with a collect_ignore denylist in its conftest. Most of the existing mpnn suite loads structures via atomworks cached_parse (needs a DIGS PDB mirror absent in CI) and is held out, run locally on the cluster; the CPU-portable files are collected. Fix two stale numpy-bool identity assertions (np.True_ is True) in test_polymer_ligand_interface.py so it is green and collectable. Adds 28 mpnn CPU tests to the gate (520 passed, was 492). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Type-check mpnn.samplers.samplers, mpnn.trainers.mpnn, and mpnn.transforms.feature_aggregation.user_settings, and drop them from the ignore_errors ratchet. Annotation/comment-only: narrow the inherited BatchSampler.sampler to Sampler for set_sampler_epoch + a var annotation; a documented type: ignore[arg-type] for the DictConfig **-unpack (per the RF3/RFD3 Loss(**loss) precedent) and type: ignore[override] for the refined validation_step 3rd param (per rf3); and a type: ignore[misc] for an atomworks AnnotationList2D row unpack its stub doesn't express. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Type-check mpnn.transforms.feature_aggregation.token_encodings, mpnn.pipelines.mpnn, and mpnn.inference_engines.mpnn, and drop them from the ignore_errors ratchet. token_encodings wraps its TokenEncoding token_atoms values in np.asarray (matching the field's ndarray type, which __post_init__ already produces); pipelines.mpnn fixes an implicit-Optional model_type with a narrowing guard + list() for a tuple arg; the inference engine narrows _absolute_path_or_none's Optional result and the atom_arrays/input_dicts guard with documented asserts. Adds test_pipeline_validation.py (6 CPU tests) for the model_type validation. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

feature_collator: two implicit-Optional default_padding params + a list var annotation (behaviour-preserving). train: replace the ad-hoc local CkptConfig with foundry's CheckpointConfig (the type FabricTrainer.fit actually expects) — CkptConfig lacked parameter_freezing_config, which fit() reads, so the checkpoint-resume branch would have raised AttributeError; the swap fixes that latent footgun. Remaining train errors are third-party stub gaps (atomworks StructuralDatasetWrapper types dataset_parser as Callable but takes a parser object; torch WeightedRandomSampler accepts a Tensor despite its Sequence[float] stub) — documented type: ignore[arg-type]. train.py is a cluster-only training script (unverified in CI); the CheckpointConfig behaviour improvement is flagged for mpnn owners. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

load_legacy_weights reads model-specific int config dims (num_rbf, num_backbone_atoms, ...) off model.graph_featurization_module, but nn.Module.__getattr__ types submodule access as Tensor | Module, so those ints are invisible to the checker. Add a small typed helper _int_config_attr(module, name) -> int wrapping getattr and route the reads through it (behaviour-identical; the concrete featurization class varies, so casting to a single class wouldn't cover all reads). Also annotate the attribute-walk local as torch.nn.Module | None. mypy-only: the loader needs a real model + legacy checkpoint (cluster-only). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Clear the last exempted mpnn module (utils.inference, 15 errors) and remove the ignore_errors ratchet block entirely, so all of models/mpnn type-checks with no module-level exemptions. Fixes: narrow two guarded Optionals with asserts (config path in cli_to_json, base_path in write_fasta); type write_structure's file_type as Literal['cif','bcif','cif.gz'] (the type to_cif_file expects, matching the docstring; the sole caller uses the default); a categories dict annotation; and two documented type: ignore[arg-type] for atomworks stub gaps (**-unpacking a heterogeneous parser-args dict into parse(); passing ndarrays to set_annotation_2d which is typed Sequence). Adds test_inference_helpers.py (15 CPU tests) for the pure parsing helpers (str2bool, none_or_type, parse_json_like, parse_list_like, _absolute_path_or_none). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Add models/rfd3na/src/rfd3na to mypy's files and seed a fresh per-module ignore_errors ratchet for the 35 modules that currently have type errors, so the gate stays green while they are cleared one slice per task. The other 27 rfd3na modules type-check clean under the lenient baseline and are now gated. Config-only; no pytest wiring (rfd3na's existing tests are cluster-coupled like rfd3's). rfd3na is a near-copy of rfd3, so several modules carry the same defects rfd3 already fixed. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Type-check 8 rfd3na modules (one error each) and drop them from the ignore_errors ratchet. Mostly annotation-only: a LongTensor cast on .long(); three empty-dict var annotations; a numpy.typing NDArray for a pydantic bool-mask field; an assert narrowing a cache attr; and a type: ignore[list-item] for the Transform-class requires_previous_transforms pattern (mirroring rfd3). Also corrects a stale return annotation on hbonds_hbplus.calculate_hbonds (it was copied from the different calculate_hbonds in hbonds.py; the real return is (AtomArray, list, int), matching its callers). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Type-check 10 more rfd3na modules and drop them from the ignore_errors ratchet (nucleic_ss_metrics had only informational notes). Most mirror the already-landed rfd3 fixes: implicit-Optional params; a str|None base attr; an isinstance re-narrow after an untyped call; PathLike|None + transform assert; crop-center int|float decls + a cast; type: ignore for the Transform-class requires_previous_transforms and the DictConfig/None resize arg. Two genuine correctness fixes (both matching rfd3): correct permute_symmetric_atom_names_'s signature to np.ndarray (it does ndarray indexing and callers pass ndarrays); and drop vizualize's broken import of _add_design_annotations_from_cif_block_metadata (deleted in open-sourcing, exists nowhere), which would ImportError on the .cif-load path. Also add a ValueError guard for an unknown dna_crop protein_contact_type (rfd3 0058). rfd3na is cluster-coupled, so these are unverified in CI. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

…om the ratchet Type-check four more rfd3na modules, mirroring the already-landed rfd3 fixes: utils/io (Path wraps, cast(Tensor) on weighted_rigid_align, a list[Path] annotation, create_example_id_extractor's Callable return type, and a coords-var trajectory rewrite); trainer_utils (rebind np.array to a distinct atom_names_arr + a cast(dict) for the heterogeneous association_schemes constant); dump_validation_structures (epoch: str|None + type: ignore[override]); run_inference (cast OmegaConf.to_container to dict[str, Any]). All behaviour-preserving; mypy-only. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Type-check four more rfd3na modules: na_geom (implicit-Optional + partner_sym_map assert), model/RFD3 (DictConfig **-unpack ignores + coord_atom_lvl_to_be_noised Tensor|None with an inference-branch assert), design_transforms (type: ignore[list-item] for the Transform-class requires_previous_transforms), pipelines (list() wrap; fix the wrong residue_cache_dir: bool to str|Path|None; a required-in-practice association_scheme ignore; drop an orphaned import). Also correct 0072's util_transforms.encode_residues_to from int|None to str -- pipelines passes the GAP string token, matching rfd3's own correction. All behaviour-preserving; mypy-only. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

…e ratchet engine gets the full rfd3 treatment: a keyword-only run override ignore, a normalize_inputs rewrite with a documented cast, deletion of the dead .split-on-a-list block in process_input, and out_dir/input is-not-None narrows -- closing the pre-filed rfd3na-engine finding. This cascades two correct param widenings (io.find_files_with_extension to set|list; input_parsing.ensure_input_is_abspath to str|PathLike|None). inference_sampler: partial_t is a tensor (body calls .mean()), so type it Tensor|None and use a distinct float scalar downstream; assert the CFG-only ref_initializer_outputs; give gamma_0 a .get default so a missing value is a clean assert, not a None>float TypeError. trainer/rfd3na: cast(int) for the recycle count, type: ignore for Loss(**loss) and validation_step override, an assert the loss is set, and a dict annotation. All behaviour-preserving; mypy-only. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Type-check three more rfd3na modules, fixing real defects surfaced by honest typing: two NameErrors in utils.inference (logger -> ranked_logger; an undefined scale -> ori_jitter in the jitter block) and np.array used as a type annotation (-> np.ndarray). hbonds: implicit-Optional selection masks, Tuple[str] -> Tuple[str, ...] defaults, and a .sum()-on-None guard (None mask = no selection limit). na_geom_utils: cast the heterogeneous ATOM_REGION_BY_RESI constant, correct make_coord_list's signature (it returns coordinate floats and its caller passes list[int]/list[str|None]), and drop redundant re-annotations. The NameError fixes make previously-crashing cluster-only paths work; rfd3na is cluster-coupled, so unverified in CI. mypy-only otherwise. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

The association_schemes constant is a heterogeneous nested-dict structure built up across the module with .copy()/|=/indexed-assignment/.items(), so mypy inferred its values as object and every such operation failed. Annotate it dict[str, Any] -- one change clears all 14 errors and keeps the earlier cast(dict, association_schemes[scheme]) uses non-redundant. Data-only module; annotation-only. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

…om rfd3 Mirrors the rfd3 fix: 11 implicit-Optional params in create_atom_array_from_design_specification_legacy with the cascaded union-attr/arg-type errors narrowed via is-not-None + a cast(str, contig) + fixed_atoms or {}; a list-comp-for-side-effects rewritten as a loop; unfix_residues annotation + [0] unpack; cast(list, backbone_atoms_*); and the missing spoof_helical_bundle_ss_conditioning_fn import (was a NameError). Also ports _check_has_backbone_connections_to_nonstandard_residues from rfd3's legacy_input_parsing (it was dropped in the rfd3na fork) so the next slice can clear input_parsing, which imports it. rfd3na is cluster-coupled; mypy-only otherwise. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Now unblocked by the 0078 port of _check_has_backbone_connections_to_ nonstandard_residues. Mirrors the rfd3 fix: add the four missing imports (create_o_atoms/create_cb_atoms, fetch_mask_from_idx, and the ported _check_has_backbone helper -- all NameErrors); fix create_motif_residue's error message to use the token's own chain_id/res_id (was undefined src_chain/src_resid); narrow accumulate_components' components to List[str] (the int was spurious) and unindexed_breaks to List[bool|None]; rewrite a list-comp-for-side-effects as a set loop; data: dict|None; and a components->component typo. rfd3na is cluster-coupled; the NameError fixes make reachable paths work but are unverified in CI. mypy-only otherwise. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

lyskov and others added 20 commits June 30, 2026 19:29

lyskov-ai requested a review from woodsh17 July 1, 2026 03:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Type-check rfd3na input_parsing#354

Type-check rfd3na input_parsing#354
lyskov-ai wants to merge 20 commits into
RosettaCommons:productionfrom
lyskov-ai:0079-rfd3na-mypy-input-parsing

lyskov-ai commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lyskov-ai commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants