Utilities
Standardization
The multiplex standardization stack currently lives in
spora_io.utils.dataset.standardize.
Important entry points:
build_standardizerBaseStandardizerIdentityStandardizerStatsBackedStandardizerQuantileClippingStandardizerQuantileClippingLog1PStandardizer
These classes operate on the parquet-backed standardization layout under
<modality>/<resolution>/standardization/<spec>/.
Image Transforms
The filter factory used by multiplex datasets lives in
spora_io.utils.dataset.transforms.FilterFactory.
Tiling
- spora_io.utils.helpers.tile.best_mask_tiling_try_to_stop(mask, tile_size, stride=None, tolerance=0.2, coverage_goal=0.99, min_gain_ratio=0.05, max_tiles=None, allow_overlap=True, progress=False, progress_desc='Tiling')[source]
Find a good tiling of the unmasked region with adaptive stopping.
Stopping is controlled by TWO criteria that must both be true to stop:
covered_valid / total_valid >= coverage_goal
best_gain / tile_area < min_gain_ratio
This makes the two parameters complementary:
coverage_goal=0.98, min_gain_ratio=0.05Runs past 0.98 as long as tiles still contribute ≥5 % new pixels, potentially reaching near-full coverage for free.
coverage_goal=1.0, min_gain_ratio=0.05Aims for full coverage but bails early once tiles become mostly redundant (< 5 % new pixels), avoiding useless overlap.
Set
min_gain_ratio=0.0to recover the original hard-cutoff behaviour (stops exactly at coverage_goal).- Parameters:
mask (
ndarray) – Binary mask of shape (H, W), with 1 = valid/unmasked, 0 = masked.tile_size (
int) – Tile size C, so each tile is C x C.stride (
int) – Sliding stride. Defaults to tile_size (non-overlapping grid).tolerance (
float) – Maximum fraction of invalid pixels allowed inside a tile (0 = strict).coverage_goal (
float) – Soft lower bound on coverage — the loop will not stop below this unless gains have already hit zero.min_gain_ratio (
float) – Soft upper bound on marginal efficiency — once the best remaining tile covers less than this fraction of its area in new pixels, AND coverage_goal has been reached, the loop stops. Range [0, 1). Default 0.05.max_tiles (
int) – Hard cap on number of selected tiles.allow_overlap (
bool) – If False, selected tiles cannot overlap each other.progress (
bool) – Show a tqdm progress bar on stderr.progress_desc (
str) – Label prefix on the progress bar.
- Returns:
tiles (list[Tile])
stats (dict)
covered_mask (np.ndarray)