Skip to content

Feature: PCA aggregation for n >= 4 channels in render_images #450

@timtreis

Description

@timtreis

Problem

For images with n ≥ 4 channels, render_images falls back to an additive-RGB "stack" strategy (render.py:1597–1630): each channel gets a categorical seed color, channels are mapped through cmap → black linear maps, then summed and clipped to [0, 1].

This works for n ≤ 3 (it's the standard fluorescence-microscopy idiom) but degrades sharply at n ≥ 4:

  • Additive blending saturates fast — dense co-expression regions collapse to white, and individual-channel signal becomes hard to attribute.
  • Seed colors are picked by channel index, not by data variation — uninformative channels claim equal visual real estate alongside the structural ones.
  • For multiplexed assays (CODEX, IMC, MERFISH/Xenium morphology, CycIF) with 4–60 channels, the result is essentially unreadable — see Failing to recreate Xenium Exlorer-style multi-stain segmentation panel plot #534 for a real user reproduction (Xenium Explorer-style multi-stain panel).

The in-code TODO at render.py:1630 ("update when pca is added as strategy") points at the same gap.

What is not the gap

Per-channel norm support (a list of Normalize objects, one per channel) is already in main — see tests/pl/test_render_images.py:474. That fixes the dynamic-range half of #534 but does not address muddy composition: a recent verification on a synthetic 4-channel Xenium-morphology stand-in found that tuning per-channel norm to each peak reduced saturation but the additive overlap still produces blob-soup. The two problems are independent; this issue tracks only the compositing half.

Proposed solution: PCA reduction strategy

Add a multichannel_strategy: Literal[\"stack\", \"pca\"] | None = None kwarg to render_images. Default to \"stack\" (today's behavior) when n_channels ≤ 3; default to \"pca\" when n_channels ≥ 4. Log the chosen strategy (one line, like the existing stack log).

Algorithm for \"pca\":

  1. Stack (c, y, x)(c, h·w) after per-channel norm has been applied (so the reduction sees normalized intensities, not raw counts).
  2. Run sklearn.decomposition.PCA(n_components=3, random_state=0) on the transposed matrix.
  3. Reshape (h·w, 3)(3, y, x) and rescale each component to [0, 1] independently.
  4. Stack as RGB and render through the existing 3-channel path.

The 3-component cap is intentional: PCA → RGB is the standard multiplex-visualization recipe (used by napari plugins, MCMICRO, Steinbock). For interpretability, log the explained-variance ratio per component.

API sketch

sdata.pl.render_images(
    \"morphology_focus\",
    channel=[\"DAPI\", \"ATP1A1/CD45/E-Cadherin\", \"18S\", \"AlphaSMA/Vimentin\"],
    norm=[Normalize(0, p, clip=True) for p in (5000, 8000, 50000, 3000)],
    multichannel_strategy=\"pca\",  # explicit; default auto-switches at n≥4
).pl.show()

Edge cases

  • Interaction with cmap / palette: PCA produces 3 abstract components without per-channel identity, so per-channel cmap and palette lists do not apply. Either silently ignore with a warning, or error on the combination. Lean: warn + ignore.
  • Channel legend: channels_as_legend=True cannot map back to source channels under PCA. Either ignore with a warning or instead emit a small bar of explained-variance ratios.
  • Dask-backed sources: PCA needs a materialized matrix. Compute once after rasterization / multiscale-best-scale selection so we work on the canvas-size array, not the raw source.
  • NaN propagation: error early (consistent with current render_images NaN policy).
  • Reproducibility: random_state=0 and sign-normalize each component (deterministic sign by max-abs convention) so the rendered colors are stable across runs.
  • Fewer than 3 channels: \"pca\" is meaningless; raise.
  • transfunc interaction: applied before PCA, same as it runs before the existing rasterize/composite.
  • n_components < 3 (rank-deficient input): zero-pad missing components so the RGB stack still has 3 channels.

Out of scope

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions