Utilities

Standardization

The multiplex standardization stack currently lives in spora_io.utils.dataset.standardize.

Important entry points:

  • build_standardizer

  • BaseStandardizer

  • IdentityStandardizer

  • StatsBackedStandardizer

  • QuantileClippingStandardizer

  • QuantileClippingLog1PStandardizer

These classes operate on the parquet-backed standardization layout under <modality>/<resolution>/standardization/<spec>/.

Image Transforms

The filter factory used by multiplex datasets lives in spora_io.utils.dataset.transforms.FilterFactory.

class spora_io.utils.dataset.transforms.FilterFactory(filters_to_apply, filter_params)[source]

A factory class to create and apply a sequence of filters to an input tensor.

Parameters:
  • filters_to_apply (List[str])

  • filter_params (Dict[str, Dict[str, Any]])

Tiling

spora_io.utils.helpers.tile.best_mask_tiling_try_to_stop(mask, tile_size, stride=None, tolerance=0.2, coverage_goal=0.99, min_gain_ratio=0.05, max_tiles=None, allow_overlap=True, progress=False, progress_desc='Tiling')[source]

Find a good tiling of the unmasked region with adaptive stopping.

Stopping is controlled by TWO criteria that must both be true to stop:

  1. covered_valid / total_valid >= coverage_goal

  2. best_gain / tile_area < min_gain_ratio

This makes the two parameters complementary:

  • coverage_goal=0.98, min_gain_ratio=0.05

    Runs past 0.98 as long as tiles still contribute ≥5 % new pixels, potentially reaching near-full coverage for free.

  • coverage_goal=1.0, min_gain_ratio=0.05

    Aims for full coverage but bails early once tiles become mostly redundant (< 5 % new pixels), avoiding useless overlap.

Set min_gain_ratio=0.0 to recover the original hard-cutoff behaviour (stops exactly at coverage_goal).

Parameters:
  • mask (ndarray) – Binary mask of shape (H, W), with 1 = valid/unmasked, 0 = masked.

  • tile_size (int) – Tile size C, so each tile is C x C.

  • stride (int) – Sliding stride. Defaults to tile_size (non-overlapping grid).

  • tolerance (float) – Maximum fraction of invalid pixels allowed inside a tile (0 = strict).

  • coverage_goal (float) – Soft lower bound on coverage — the loop will not stop below this unless gains have already hit zero.

  • min_gain_ratio (float) – Soft upper bound on marginal efficiency — once the best remaining tile covers less than this fraction of its area in new pixels, AND coverage_goal has been reached, the loop stops. Range [0, 1). Default 0.05.

  • max_tiles (int) – Hard cap on number of selected tiles.

  • allow_overlap (bool) – If False, selected tiles cannot overlap each other.

  • progress (bool) – Show a tqdm progress bar on stderr.

  • progress_desc (str) – Label prefix on the progress bar.

Returns:

  • tiles (list[Tile])

  • stats (dict)

  • covered_mask (np.ndarray)

spora_io.utils.helpers.tile.get_grid_tile(mask, tile_size, stride=None, tolerance=0.85)[source]

Return fixed-grid tiles, padding image edges with background.

Parameters:
  • mask (ndarray)

  • tile_size (int)

  • stride (int)

  • tolerance (float)

class spora_io.utils.helpers.tile.Tile(y, x, h, w, valid_ratio, gain)[source]
Parameters:
  • y (int)

  • x (int)

  • h (int)

  • w (int)

  • valid_ratio (float)

  • gain (int)

General

spora_io.utils.utils.get_modalities_of_dataset(dataset_name, base_path)[source]
spora_io._config.get_datasets_dir()[source]

Return the root datasets directory.

Return type:

Path