Type-check rfd3na io / trainer-utils / dump / run_inference by lyskov-ai · Pull Request #348 · RosettaCommons/foundry

lyskov-ai · 2026-07-01T01:04:48Z

Continues clearing the models/rfd3na mypy ratchet — four modules, leaving 14. Each mirrors the fix already landed for the near-identical rfd3 module.

utils/io.py: Path(...) wraps on the to_cif_file path args; cast the alignment inputs to Tensor; annotate a list[Path]; correct create_example_id_extractor's return type to Callable[[PathLike], str] (it returns a closure); and use a separate coords variable in the trajectory loop instead of rebinding a Tensor loop var to an ndarray.
trainer/trainer_utils.py: bind np.array(atom_names) to a distinct name (rebinding the list loop var to an ndarray confused its type); and cast the selected association_schemes entry to dict (the constant has heterogeneous values, so mypy infers object).
trainer/dump_validation_structures.py: epoch: str | None and a type: ignore[override] for the keyword-dispatched Fabric hook.
run_inference.py: cast OmegaConf.to_container(...) to dict[str, Any] (its annotated return is a broad union).

All behaviour-preserving; rfd3na is cluster-coupled so these run locally, not in CI.

Verification: ruff format/check clean; mypy Success (238 files); pytest 541 passed / 10 skipped.

Clear rfd3.trainer.fabric_trainer (99 errors) from the mypy ignore-errors ratchet. The errors were one dominant pattern: the dynamically-keyed `state` bag was inferred as `dict[Any, Any] | int | None`, so every state access and counter update errored. Mirror the already-landed foundry trainers/fabric precedent: a class-level `state: dict[str, Any]` + `_current_train_return: Any`, an annotated `default_state`, a documented type-ignore on the wider str|int precision API, a cast on setup_dataloaders + widening the loop params to DataLoader, `get_latest_checkpoint -> Path | None` with a cast at the call site, and the truthful `load_legacy_checkpoint -> None`. Behaviour-preserving, mypy-only (no clean CPU-test target for this cluster-coupled trainer glue). Ratchet 4 -> 3 modules remaining. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Clear the 15 mypy errors in rfd3.inference.legacy_input_parsing and remove it from the ignore_errors ratchet (3 -> 2 remaining: engine, utils.vizualize). - 11 implicit-Optional parameter fixes in create_atom_array_from_design_specification_legacy (incl. unfix_specific -> str | list | None, honest to the body's isinstance branches). - The Optional fixes cascade union-attr/arg-type errors at exists()-guarded sites; since foundry.common.exists is not a TypeGuard it does not narrow, so the narrowing-required guards are converted to behaviour-identical 'x is not None' checks (exists(x) is by definition 'x is not None'). - unfix_residues: list[str] annotation; rewrite the 'unfix_residues, _ = ...' unpack to '...[0]' and the side-effecting list-comp to a for loop, dropping the '_' that collided with the **_ kwargs parameter. - Documented cast(str, contig) (contig is always provided or defaulted to the non-optional length, which mypy cannot follow through exists()). - Fix a real reachable NameError: import the missing spoof_helical_bundle_ss_conditioning_fn from rfd3.utils.inference (its branch is config-gated and would crash today). mypy-only slice; no behaviour change on production paths. All gates green (ruff format/check, mypy 152 files, pytest 486 passed / 10 skipped). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Fix the pre-existing bugs that held the last two models/rfd3 modules on the ignore_errors ratchet, then remove the rfd3 override block entirely. All of models/rfd3 now type-checks with no module-level exemptions. engine.py: - run(): documented `# type: ignore[override]` + honest return type `dict[str, list[RFD3Output]] | None` for the keyword-only override of the positional base BaseInferenceEngine.run(). - _multiply_specifications: `exists(self.out_dir)` -> `self.out_dir is not None` so mypy narrows it for find_files_with_extension(). - normalize_inputs: early returns + a documented cast for the str-split branch (list is invariant). - process_input: delete the dead/redundant block that re-did normalize_inputs' work and called .split(',') on a value that is always a list; narrow the two loop guards to `input is not None`. io.py: widen find_files_with_extension's supported_file_types to `set | list` (matches its siblings; the caller passes the CIF_LIKE_EXTENSIONS set). vizualize.py (dev script, imported nowhere): drop the broken import+call of _add_design_annotations_from_cif_block_metadata (removed in the open-sourcing refactor, exists nowhere -> ImportError on the .cif path); keep plain-structure .cif/.bcif loading; document the atomworks C_DIS .as_array() ignore. Add unit tests for the pure normalize_inputs helper. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Add models/mpnn/src/mpnn to mypy's files and seed a fresh per-module ignore_errors ratchet for the 10 modules that currently have type errors, so the gate stays green while they are cleared one slice per task. The other 14 mpnn modules type-check clean under the lenient baseline and are now gated. Config-only; no pytest wiring (mpnn's existing test suite is partly cluster-coupled / library-drift-broken on CPU). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Add models/mpnn/tests to testpaths with a collect_ignore denylist in its conftest. Most of the existing mpnn suite loads structures via atomworks cached_parse (needs a DIGS PDB mirror absent in CI) and is held out, run locally on the cluster; the CPU-portable files are collected. Fix two stale numpy-bool identity assertions (np.True_ is True) in test_polymer_ligand_interface.py so it is green and collectable. Adds 28 mpnn CPU tests to the gate (520 passed, was 492). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Type-check mpnn.samplers.samplers, mpnn.trainers.mpnn, and mpnn.transforms.feature_aggregation.user_settings, and drop them from the ignore_errors ratchet. Annotation/comment-only: narrow the inherited BatchSampler.sampler to Sampler for set_sampler_epoch + a var annotation; a documented type: ignore[arg-type] for the DictConfig **-unpack (per the RF3/RFD3 Loss(**loss) precedent) and type: ignore[override] for the refined validation_step 3rd param (per rf3); and a type: ignore[misc] for an atomworks AnnotationList2D row unpack its stub doesn't express. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Type-check mpnn.transforms.feature_aggregation.token_encodings, mpnn.pipelines.mpnn, and mpnn.inference_engines.mpnn, and drop them from the ignore_errors ratchet. token_encodings wraps its TokenEncoding token_atoms values in np.asarray (matching the field's ndarray type, which __post_init__ already produces); pipelines.mpnn fixes an implicit-Optional model_type with a narrowing guard + list() for a tuple arg; the inference engine narrows _absolute_path_or_none's Optional result and the atom_arrays/input_dicts guard with documented asserts. Adds test_pipeline_validation.py (6 CPU tests) for the model_type validation. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

feature_collator: two implicit-Optional default_padding params + a list var annotation (behaviour-preserving). train: replace the ad-hoc local CkptConfig with foundry's CheckpointConfig (the type FabricTrainer.fit actually expects) — CkptConfig lacked parameter_freezing_config, which fit() reads, so the checkpoint-resume branch would have raised AttributeError; the swap fixes that latent footgun. Remaining train errors are third-party stub gaps (atomworks StructuralDatasetWrapper types dataset_parser as Callable but takes a parser object; torch WeightedRandomSampler accepts a Tensor despite its Sequence[float] stub) — documented type: ignore[arg-type]. train.py is a cluster-only training script (unverified in CI); the CheckpointConfig behaviour improvement is flagged for mpnn owners. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

load_legacy_weights reads model-specific int config dims (num_rbf, num_backbone_atoms, ...) off model.graph_featurization_module, but nn.Module.__getattr__ types submodule access as Tensor | Module, so those ints are invisible to the checker. Add a small typed helper _int_config_attr(module, name) -> int wrapping getattr and route the reads through it (behaviour-identical; the concrete featurization class varies, so casting to a single class wouldn't cover all reads). Also annotate the attribute-walk local as torch.nn.Module | None. mypy-only: the loader needs a real model + legacy checkpoint (cluster-only). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Clear the last exempted mpnn module (utils.inference, 15 errors) and remove the ignore_errors ratchet block entirely, so all of models/mpnn type-checks with no module-level exemptions. Fixes: narrow two guarded Optionals with asserts (config path in cli_to_json, base_path in write_fasta); type write_structure's file_type as Literal['cif','bcif','cif.gz'] (the type to_cif_file expects, matching the docstring; the sole caller uses the default); a categories dict annotation; and two documented type: ignore[arg-type] for atomworks stub gaps (**-unpacking a heterogeneous parser-args dict into parse(); passing ndarrays to set_annotation_2d which is typed Sequence). Adds test_inference_helpers.py (15 CPU tests) for the pure parsing helpers (str2bool, none_or_type, parse_json_like, parse_list_like, _absolute_path_or_none). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Add models/rfd3na/src/rfd3na to mypy's files and seed a fresh per-module ignore_errors ratchet for the 35 modules that currently have type errors, so the gate stays green while they are cleared one slice per task. The other 27 rfd3na modules type-check clean under the lenient baseline and are now gated. Config-only; no pytest wiring (rfd3na's existing tests are cluster-coupled like rfd3's). rfd3na is a near-copy of rfd3, so several modules carry the same defects rfd3 already fixed. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Type-check 8 rfd3na modules (one error each) and drop them from the ignore_errors ratchet. Mostly annotation-only: a LongTensor cast on .long(); three empty-dict var annotations; a numpy.typing NDArray for a pydantic bool-mask field; an assert narrowing a cache attr; and a type: ignore[list-item] for the Transform-class requires_previous_transforms pattern (mirroring rfd3). Also corrects a stale return annotation on hbonds_hbplus.calculate_hbonds (it was copied from the different calculate_hbonds in hbonds.py; the real return is (AtomArray, list, int), matching its callers). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Type-check 10 more rfd3na modules and drop them from the ignore_errors ratchet (nucleic_ss_metrics had only informational notes). Most mirror the already-landed rfd3 fixes: implicit-Optional params; a str|None base attr; an isinstance re-narrow after an untyped call; PathLike|None + transform assert; crop-center int|float decls + a cast; type: ignore for the Transform-class requires_previous_transforms and the DictConfig/None resize arg. Two genuine correctness fixes (both matching rfd3): correct permute_symmetric_atom_names_'s signature to np.ndarray (it does ndarray indexing and callers pass ndarrays); and drop vizualize's broken import of _add_design_annotations_from_cif_block_metadata (deleted in open-sourcing, exists nowhere), which would ImportError on the .cif-load path. Also add a ValueError guard for an unknown dna_crop protein_contact_type (rfd3 0058). rfd3na is cluster-coupled, so these are unverified in CI. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

…om the ratchet Type-check four more rfd3na modules, mirroring the already-landed rfd3 fixes: utils/io (Path wraps, cast(Tensor) on weighted_rigid_align, a list[Path] annotation, create_example_id_extractor's Callable return type, and a coords-var trajectory rewrite); trainer_utils (rebind np.array to a distinct atom_names_arr + a cast(dict) for the heterogeneous association_schemes constant); dump_validation_structures (epoch: str|None + type: ignore[override]); run_inference (cast OmegaConf.to_container to dict[str, Any]). All behaviour-preserving; mypy-only. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

lyskov and others added 14 commits June 30, 2026 19:29

lyskov-ai requested a review from woodsh17 July 1, 2026 01:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Type-check rfd3na io / trainer-utils / dump / run_inference#348

Type-check rfd3na io / trainer-utils / dump / run_inference#348
lyskov-ai wants to merge 14 commits into
RosettaCommons:productionfrom
lyskov-ai:0073-rfd3na-mypy-io-trainer-utils

lyskov-ai commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lyskov-ai commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants