Dataset Classes

These classes define the main loading interfaces for the current on-disk dataset format.

Base

class spora_io.datasets.base.BaseImagingDataset(name, modality, resolution, path=None, tile_size=None, load_cell_metadata=False, verbose=True, label=None, labels_to_keep=None, label_modifying_fn=None, label_type='classification', tile_strategy=None, split=None)[source]

Bases: ABC

Base class for all imaging datasets.

Parameters:
  • name (str)

  • modality (ModKey)

  • resolution (float | str)

  • path (os.PathLike | str | None)

  • tile_size (Optional[int])

  • load_cell_metadata (bool)

  • verbose (bool)

  • label (Optional[str])

  • labels_to_keep (Optional[Sequence[str]])

  • label_modifying_fn (Optional[Callable])

  • label_type (str)

  • tile_strategy (Optional[str])

  • split (Optional[str])

name

The name of the dataset.

Type:

str

path

The root path to the dataset. If omitted, this resolves to SPORA_DATASETS_DIR / name.

Type:

Path

modality

The modality of the dataset.

Type:

ModKey

resolution

The resolution of the dataset in mpp, formatted as a string with underscores instead of decimals (e.g. “0_5mpp”).

Type:

str

tile_size

The tile size in pixels. If None, tiling functionality will be disabled.

Type:

Optional[int]

tile_strategy

The tiling strategy used for the dataset. If None, defaults to “default”. This is used to determine the subdirectory under tiling/<resolution>/ where tile coordinates are stored.

Type:

Optional[str]

label

The name of the label column in the tissue metadata. If None, no labels will be loaded.

Type:

Optional[str]

labels_to_keep

The list of label values to keep if label is not None. If None, all labels will be kept.

Type:

Optional[Sequence[str]]

label_modifying_fn

A function to modify the labels after loading. For example, this can be used to binarize labels or group certain labels together. If None, labels will not be modified.

Type:

Optional[Callable]

label_type

The type of the label, either “classification” or “regression”. This is used to determine how to encode the labels. Default is “classification”.

Type:

str

get_tissue_ids(kind='modality')[source]

Get the unique tissue IDs from the tissue annotations.

Parameters:
  • kind (str) – The kind of tissue IDs to retrieve. Default is “modality”, which returns tissue ids for tissues that have the specified modality. If “all”, returns tissue ids for all tissues in the dataset.

Returns:

An array of unique tissue IDs.

Return type:

ndarray[tuple[Any, ...], dtype[str_]]

abstractmethod get_tissue(tissue_id, kind='complete', preprocess=True, image_mode='CHW')[source]

Get a tissue image by tissue ID.

Subclasses define the valid kind values and preprocessing behavior.

Return type:

HETissue | MultiplexTissue | IHCTissue

Parameters:
  • tissue_id (str)

  • preprocess (bool)

  • image_mode (str)

get_tissue_mask(tissue_id)[source]

Get the tissue mask for a given tissue id. If the tissue masks directory does not exist, return a full mask.

Parameters:
  • tissue_id (str) – The tissue ID to retrieve the mask for.

Returns:

The tissue mask as a TissueMask instance.

Return type:

TissueMask

abstractmethod get_tile(tissue_id, tile_id, kind='complete', image_mode='CHW', preprocess=True)[source]

Get a specific tile based on the tissue id and tile id. Tile IDs are precomputed during tiling. If tile coordinates are not available, this method should return a random tile from the tissue image. Subclasses that implement tiling should override this method to load the tile based on the precomputed tile coordinates.

Parameters:
  • tissue_id (str) – The tissue ID to retrieve the tile for.

  • tile_id (int) – The tile ID to retrieve.

  • kind (str) – The kind of tile image to retrieve. Default is “complete”.

  • image_mode (str) – The image mode of the tile image. Valid options are “CHW” and “HWC”. Default is “CHW”.

  • preprocess (bool) – Whether to preprocess the tile image (e.g. normalize) before returning it. Default is True.

Returns:

The tile image as a Tissue instance.

Return type:

HETissue | MultiplexTissue | IHCTissue

abstractmethod get_tile_by_coordinates(tissue_id, row, col, kind='complete', image_mode='CHW', preprocess=True)[source]

Get a specific tile based on the tissue id and tile coordinates. This method is used when tile coordinates are not precomputed and get_tile should return a random tile.

Parameters:
  • tissue_id (str) – The tissue ID to retrieve the tile for.

  • row (int) – The row coordinate of the tile.

  • col (int) – The column coordinate of the tile.

  • kind (str) – The kind of tile image to retrieve. Default is “complete”.

  • image_mode (str) – The image mode of the tile image. Valid options are “CHW” and “HWC”. Default is “CHW”.

  • preprocess (bool) – Whether to preprocess the tile image (e.g. normalize) before returning it. Default is True.

Returns:

The tile image as a Tissue instance.

Return type:

HETissue | MultiplexTissue | IHCTissue

get_cell_instance_mask(tissue_id, use_instances_from_virtues=False)[source]

Get the cell instance mask for a given tissue id.

Parameters:
  • tissue_id (str) – The tissue ID to retrieve the cell instance mask for.

  • use_instances_from_virtues (bool) – Whether to use instances generated using the VirTues foundation model.

Returns:

The cell instance mask as a CellMask instance.

Return type:

CellMask

get_cell_task_mask(tissue_id, mask_type)[source]

Get the cell task mask for a given tissue id and mask type.

Parameters:
  • tissue_id (str) – The tissue ID to retrieve the cell task mask for.

  • mask_type (str) – The type of cell task mask to retrieve. Valid options can be retrieved from get_cell_task_mask_types method.

Returns:

The cell task mask as a CellMask instance.

Return type:

CellMask

get_cell_task_mask_types()[source]

Get the available cell task mask types for the dataset. :returns: A list of available cell task mask types.

Return type:

Sequence[str]

get_tissue_by_patient(patient_id, kind='complete', preprocess=True, image_mode='CHW')[source]

Get all tissue images associated with a patient ID.

Return type:

Sequence[HETissue | MultiplexTissue | IHCTissue]

Parameters:
  • patient_id (str)

  • preprocess (bool)

  • image_mode (str)

H&E

class spora_io.datasets.he.HEImagingDataset(name, resolution, path=None, load_cell_metadata=False, verbose=True, mean_std_type='imagenet', tile_size=None, tile_strategy=None, split=None, **kwargs)[source]

Bases: BaseImagingDataset

Class for handling H&E stained imaging datasets.

Parameters:
  • name (str)

  • resolution (float | str)

  • path (os.PathLike | str | None)

  • load_cell_metadata (bool)

  • verbose (bool)

  • mean_std_type (str)

  • tile_size (Optional[int])

  • tile_strategy (Optional[str])

  • split (Optional[str])

IMAGENET_MEAN

The mean values for ImageNet normalization.

Type:

torch.Tensor

IMAGENET_STD

The standard deviation values for ImageNet normalization.

Type:

torch.Tensor

HIBOU_MEAN

The mean values for HIBOU normalization.

Type:

torch.Tensor

HIBOU_STD

The standard deviation values for HIBOU normalization.

Type:

torch.Tensor

mean_std_type

The type of mean and standard deviation to use for normalization. Valid options are “imagenet” and “hibou”.

Type:

str

get_tissue(tissue_id, kind='complete', preprocess=True, image_mode='CHW')[source]

Get the normalized tissue image post filtering channels for a given tissue id.

Parameters:
  • tissue_id (str) – The tissue ID to retrieve the image for.

  • kind (str) – The kind of tissue image to retrieve. Only “complete” is supported for H&E datasets since there is only one modality channel. Default is “complete”.

  • preprocess (bool) – If True, preprocess the image (normalize). Default is True.

Returns:

The normalized tissue image as an HETissue instance.

Return type:

HETissue

get_tile_by_coordinates(tissue_id, row, col, kind='complete', image_mode='CHW', preprocess=True)[source]

Get a specific tile based on the tissue id and tile id

Parameters:
  • tissue_id (str) – The tissue ID to retrieve the tile for.

  • row (int) – The row coordinate of the tile.

  • col (int) – The column coordinate of the tile.

  • kind (str) – The kind of tile image to retrieve. Default is “complete”.

  • image_mode (str) – The image mode of the tile image. Valid options are “CHW” and “HWC”. Default is “CHW”.

  • preprocess (bool) – If True, preprocess the image (normalize). Default is True.

Returns:

The specific tile as an HETissue instance.

Return type:

HETissue

get_tile(tissue_id, tile_id, kind='complete', image_mode='CHW', preprocess=True)[source]

Get a specific tile based on the tissue id and tile id

Parameters:
  • tissue_id (str) – The tissue ID to retrieve the tile for.

  • tile_id (int) – The tile ID to retrieve.

  • kind (str)

  • image_mode (str)

  • preprocess (bool)

Returns:

The specific tile as an HETissue instance.

Return type:

HETissue

Multiplex (IMC, CODEX, CycIF, etc.)

class spora_io.datasets.multiplex.MultiplexImagingDataset(name, modality, standardization, resolution, path=None, tile_size=None, verbose=True, load_cell_metadata=False, disable_quantile_mask=False, filter_list=None, use_mean_std=True, return_uniprot_ids=True, replace_nuclear_uniprot_ids=False, tile_strategy=None, split=None, **kwargs)[source]

Bases: BaseImagingDataset

Class for handling multiplex imaging datasets.

Parameters:
  • name (str)

  • modality (str)

  • standardization (str)

  • resolution (float | str)

  • path (os.PathLike | str | None)

  • tile_size (Optional[int])

  • verbose (bool)

  • load_cell_metadata (bool)

  • disable_quantile_mask (bool)

  • filter_list (List[str] | None)

  • use_mean_std (bool)

  • return_uniprot_ids (bool)

  • replace_nuclear_uniprot_ids (bool)

  • tile_strategy (Optional[str])

  • split (Optional[str])

VALID_MODALITIES

A set of valid modalities for multiplex imaging datasets. Valid options are “imc”, “codex”, and “cycif”.

Type:

set

standardization

The type of standardization to apply to the images. This is passed to the build_standardizer function to create a standardizer instance.

Type:

str

disable_quantile_mask

Quantile masking will search for channels with 0 quantile / variance and exclude them from standardization. Setting this to True will disable this behavior and include all channels in standardization. This is passed to the build_standardizer function.

Type:

bool

filter_list

A list of filter names to apply to the dataset. Currently supported filters are “gaussian_blur” and “median_filter”. These are applied sequentially, and the parameters for each filter can be specified in the filter_params dictionary in kwargs, with the filter name as the key and the parameters as a dictionary of parameters for that filter.

Type:

List[str]

use_mean_std

Whether to use mean and standard deviation for standardization. If False, only quantile normalization will be applied if not disabled. This is passed to the build_standardizer function.

Type:

bool

return_uniprot_ids

Whether to return uniprot IDs for the channels. This requires a “uniprot_id” column in the channels.parquet file. If this column is not present, uniprot IDs will not be returned regardless of this setting.

Type:

bool

replace_nuclear_uniprot_ids

Whether to replace UniProt IDs for channels flagged with is_nuclear_marker with Histone H3’s UniProt ID (P68431). This lets nuclear markers participate in uniprot_filtered loading even when the source channel lacks a protein-specific UniProt ID.

Type:

bool

get_channel_names(tissue_id, kind='complete', measured_mask=None, qc_mask=None, filtered_mask=None)[source]

Get the channel names for a given tissue id and kind.

Parameters:
  • tissue_id (str) – The tissue ID to retrieve the channel names for.

  • kind (str) – The kind of tissue image to retrieve channel names for. Valid options are “complete”, “qc_filtered”, and “uniprot_filtered”.

  • measured_mask (np.ndarray | None) – A boolean array indicating which channels are measured for the given tissue. If None, it will be retrieved from the image_channel_map. This is used to determine which channels to consider when applying the kind filtering.

  • qc_mask (np.ndarray | None) – A boolean array indicating which channels pass quality control for the given tissue. If None, it will be computed from the quality_control_mask and measured_mask. This is used when kind is “qc_filtered” or “uniprot_filtered” to determine which channels to include.

  • filtered_mask (np.ndarray | None) – A boolean array indicating which channels have valid UniProt IDs for the given tissue. If None, it will be computed from the uniprot_mask and qc_mask. This is used when kind is “uniprot_filtered” to determine which channels to include.

Returns:

The channel names as a 1D array of shape (n_channels,).

Return type:

ndarray[tuple[Any, ...], dtype[str_]]

get_uniprot_ids(tissue_id, kind='complete', measured_mask=None, qc_mask=None, filtered_mask=None)[source]

Get the uniprot IDs for a given tissue id and kind. This will return None if return_uniprot_ids is False or if there are no valid uniprot IDs for the given kind.

Parameters:
  • tissue_id (str) – The tissue ID to retrieve the uniprot IDs for.

  • kind (str) – The kind of tissue image to retrieve uniprot IDs for. Valid options are “complete”, “qc_filtered”, and “uniprot_filtered”.

  • measured_mask (np.ndarray | None) – A boolean array indicating which channels are measured for the given tissue. If None, it will be retrieved from the image_channel_map. This is used to determine which channels to consider when applying the kind filtering.

  • qc_mask (np.ndarray | None) – A boolean array indicating which channels pass quality control for the given tissue. If None, it will be computed from the quality_control_mask and measured_mask. This is used when kind is “qc_filtered” or “uniprot_filtered” to determine which channels to include.

  • filtered_mask (np.ndarray | None) – A boolean array indicating which channels have valid UniProt IDs for the given tissue. If None, it will be computed from the uniprot_mask and qc_mask. This is used when kind is “uniprot_filtered” to determine which channels to include.

Returns:

The UniProt IDs as a 1D array of shape (n_channels,). Returns None if return_uniprot_ids is False.

Return type:

ndarray[tuple[Any, ...], dtype[object_]] | None

get_tissue(tissue_id, kind='uniprot_filtered', preprocess=True, image_mode='CHW')[source]

Get the tissue image for a given tissue id, with options for filtering channels and preprocessing.

Parameters:
  • tissue_id (str) – The tissue ID to retrieve the image for.

  • kind (str) – The kind of tissue image to retrieve. Valid options are “complete”, “qc_filtered”, and “uniprot_filtered”. Default is “uniprot_filtered”.

  • preprocess (bool) – Whether to preprocess the image using the standardizer. Default is True.

  • image_mode (str) – The image mode of the tissue image. Valid options are “CHW” and “HWC”. Default is “CHW”.

Returns:

The tissue image as a MultiplexTissue instance.

Return type:

MultiplexTissue

get_tile_by_coordinates(tissue_id, row, col, kind='uniprot_filtered', image_mode='CHW', preprocess=True)[source]

Get a specific tile based on the tissue id and tile row and column coordinates.

Parameters:
  • tissue_id (str) – The tissue ID to retrieve the tile for.

  • row (int) – The row coordinate of the top-left corner of the tile.

  • col (int) – The column coordinate of the top-left corner of the tile.

  • preprocess (bool) – Whether to preprocess the tile using the standardizer. Default is True.

  • kind (str) – The kind of tissue image to retrieve for the tile. Valid options are “complete”, “qc_filtered”, and “uniprot_filtered”. Default is “uniprot_filtered”.

  • image_mode (str) – The image mode of the tissue image. Valid options are “CHW” and “HWC”. Default is “CHW”.

Returns:

The specific tile as a MultiplexTissue instance.

Return type:

MultiplexTissue

get_tile(tissue_id, tile_id, kind='uniprot_filtered', image_mode='CHW', preprocess=True)[source]

Get a specific tile based on the tissue id and tile id

Parameters:
  • tissue_id (str) – The tissue ID to retrieve the tile for.

  • tile_id (int) – The tile ID to retrieve.

  • preprocess (bool) – Whether to preprocess the tile using the standardizer. Default is True.

  • kind (str) – The kind of tissue image to retrieve for the tile. Valid options are “complete”, “qc_filtered”, and “uniprot_filtered”. Default is “uniprot_filtered”.

  • image_mode (str) – The image mode of the tissue image. Valid options are “CHW” and “HWC”. Default is “CHW”.

Returns:

The specific tile as an MultiplexTissue instance.

Return type:

MultiplexTissue

IHC

class spora_io.datasets.ihc.SingleIHCImagingDataset(name, marker_name, resolution, tile_size, path=None, load_cell_metadata=False, verbose=True, mean_std_type='imagenet', tile_strategy=None, split=None, **kwargs)[source]

Bases: BaseImagingDataset

Class for handling IHC stained imaging datasets.

Parameters:
  • name (str)

  • marker_name (str)

  • resolution (float | str)

  • tile_size (int)

  • path (os.PathLike | str | None)

  • load_cell_metadata (bool)

  • verbose (bool)

  • mean_std_type (str)

  • tile_strategy (Optional[str])

  • split (Optional[str])

IMAGENET_MEAN

The mean values for ImageNet normalization.

Type:

torch.Tensor

IMAGENET_STD

The standard deviation values for ImageNet normalization.

Type:

torch.Tensor

HIBOU_MEAN

The mean values for HIBOU normalization.

Type:

torch.Tensor

HIBOU_STD

The standard deviation values for HIBOU normalization.

Type:

torch.Tensor

mean_std_type

The type of mean and standard deviation to use for normalization. Valid options are “imagenet” and “hibou”.

Type:

str

get_tissue(tissue_id, kind='complete', preprocess=True, image_mode='CHW')[source]

Get the normalized tissue image post filtering channels for a given tissue id.

Parameters:
  • tissue_id (str) – The tissue ID to retrieve the image for.

  • kind (str :param For H&E datasets: :param “qc_filtered” and “filtered” will return the same image since there is only one modality channel.:) – The kind of tissue image to retrieve. Default is “complete”. Valid options are “complete”, “qc_filtered”, and “filtered”.

  • preprocess (bool) – If True, preprocess the image (standardize). Default is True.

  • image_mode (str)

Returns:

The normalized tissue image as an IHCTissue instance.

Return type:

IHCTissue

get_tile_by_coordinates(tissue_id, row, col, kind='complete', image_mode='CHW', preprocess=True)[source]

Get a specific tile based on the tissue id and tile coordinates.

Parameters:
  • tissue_id (str) – The tissue ID to retrieve the tile for.

  • row (int) – The row coordinate of the tile.

  • col (int) – The column coordinate of the tile.

  • kind (str) – The kind of tile image to retrieve. Only “complete” is supported for IHC datasets.

  • image_mode (str) – The image mode of the tile image. Valid options are “CHW” and “HWC”.

  • preprocess (bool) – Whether to preprocess the tile image before returning it.

Returns:

The specific tile as an IHCTissue instance.

Return type:

IHCTissue

get_tile(tissue_id, tile_id, kind='complete', image_mode='CHW', preprocess=True)[source]

Get a specific tile based on the tissue id and tile id

Parameters:
  • tissue_id (str) – The tissue ID to retrieve the tile for.

  • kind (str) – The kind of tile image to retrieve. Only “complete” is supported for IHC datasets since there is only one modality channel. Default is “complete”.

  • image_mode (str) – The image mode of the tile image. Valid options are “CHW” and “HWC”. Default is “CHW”.

  • preprocess (bool) – Whether to preprocess the tile image (e.g. normalize) before returning it. Default is True.

  • tile_id (int) – The tile ID to retrieve.

Returns:

The specific tile as an IHCTissue instance.

Return type:

IHCTissue

Composed (Multi-modal)

class spora_io.datasets.compose.ComposedImagingDataset(name, modalities, tile_size, resolution, path=None, verbose=True, load_cell_metadata=False, tile_strategy=None, split=None, *, modality_kwargs=None)[source]

Bases: object

Compose multiple unimodal datasets (HE, Multiplex, etc.) into a single handle.

  • Uniform interface to fetch tissues/tiles per modality.

  • Ensures consistent tile strategy across modalities by construction.

  • Extensible via modality_kwargs to pass per-modality constructor arguments.

Notes on behavior: - get_tissue_ids() returns the union of tissue IDs across all instantiated modalities. - get_modalities_of_tissue(tissue_id) lists which modalities contain that tissue ID. - Marker-specific helpers (indices/metadata) are forwarded to each unimodal dataset when available.

Parameters:
  • name (str)

  • modalities (Iterable[ModKey])

  • tile_size (int)

  • resolution (float | str)

  • path (Union[str, Path] | None)

  • verbose (bool)

  • load_cell_metadata (bool)

  • tile_strategy (Optional[str])

  • split (Optional[str])

  • modality_kwargs (Optional[Mapping[str, Mapping[str, Any]]])

get_dataset(modality)[source]

Get the unimodal dataset instance for a given modality key.

Parameters:
  • modality (Union[str, HEModality, IHCModality, IMCModality, CODEXModality, CycIFModality, MIBIModality]) – The modality key (string or object with ‘name’ attribute) to retrieve the dataset for.

Returns:

The unimodal dataset instance corresponding to the modality key.

Return type:

Any

Raises:

KeyError – If the modality key is not part of this composed dataset.

get_available_modalities()[source]

Get the list of available modalities in this composed dataset. :returns: A list of modality keys representing the available modalities.

Return type:

List[str]

get_tissue_ids(modality=None)[source]

Get the list of tissue IDs available in the dataset. If a modality is specified, return only tissue IDs for that modality.

Parameters:
  • modality (Union[str, HEModality, IHCModality, IMCModality, CODEXModality, CycIFModality, MIBIModality, None]) – The modality key to filter tissue IDs by. If None, returns tissue IDs across all modalities.

Returns:

A list of tissue IDs available in the dataset (filtered by modality if specified).

Return type:

List[str]

get_modalities_of_tissue(tissue_id)[source]

Get the list of modalities available for a given tissue ID.

Parameters:
  • tissue_id (str) – The tissue ID to query modalities for.

Returns:

A list of modality keys representing the modalities available for the given tissue ID.

Return type:

List[str]

get_unimodal_tissue(tissue_id, modality, kind='uniprot_filtered', preprocess=True, image_mode='CHW')[source]

Get the tissue image for a given tissue ID and modality, with options for kind of image and preprocessing.

Parameters:
  • tissue_id (str) – The tissue ID to retrieve.

  • modality (Union[str, HEModality, IHCModality, IMCModality, CODEXModality, CycIFModality, MIBIModality]) – The modality key to specify which unimodal dataset to query.

  • kind (str) – The kind of tissue image to retrieve. Default is “uniprot_filtered”. Valid options are “complete”, “qc_filtered”, and “uniprot_filtered”.

  • preprocess (bool) – If True, preprocess the image (normalize). Default is True.

  • image_mode (str) – The desired image mode of the returned tissue image. Valid options are “CHW” and “HWC”. Default is “CHW”.

Returns:

The tissue image as returned by the unimodal dataset’s get_tissue method.

Return type:

Tissue

get_unimodal_tissue_mask(tissue_id, modality)[source]

Get the quality control mask for a given tissue ID and modality.

Parameters:
  • tissue_id (str) – The tissue ID to retrieve the mask for.

  • modality (Union[str, HEModality, IHCModality, IMCModality, CODEXModality, CycIFModality, MIBIModality]) – The modality key to specify which unimodal dataset to query.

Returns:

The quality control mask as returned by the unimodal dataset’s get_tissue_mask method.

Return type:

np.ndarray

get_unimodal_tissue_size(tissue_id, modality)[source]

Get the tissue size (C,H,W) for a given tissue ID and modality.

Parameters:
  • tissue_id (str) – The tissue ID to retrieve the size for.

  • modality (Union[str, HEModality, IHCModality, IMCModality, CODEXModality, CycIFModality, MIBIModality]) – The modality key to specify which unimodal dataset to query.

Returns:

The tissue size (C,H,W) as returned by the unimodal dataset’s _get_tissue_size method.

Return type:

Tuple[int, int, int]

get_unimodal_tile(tissue_id, tile_id, modality, kind='uniprot_filtered', preprocess=True, image_mode='CHW')[source]

Get a specific tile for a given tissue ID and modality.

Parameters:
  • tissue_id (str) – The tissue ID to retrieve the tile for.

  • tile_id (int) – The tile ID to retrieve.

  • modality (Union[str, HEModality, IHCModality, IMCModality, CODEXModality, CycIFModality, MIBIModality]) – The modality key to specify which unimodal dataset to query.

  • kind (str) – The kind of image to retrieve. Valid options depend on modality.

  • preprocess (bool) – If True, preprocess the tile before returning.

  • image_mode (str) – The returned image layout, usually “CHW” or “HWC”.

Returns:

The tile image as returned by the unimodal dataset’s get_tile method.

Return type:

Tissue

get_composed_tissue(tissue_id, kind='uniprot_filtered', preprocess=True, image_mode='CHW')[source]

Get a composed tissue sample for a given tissue ID, which includes all available modalities for that tissue.

Parameters:
  • tissue_id (str) – The tissue ID to retrieve.

  • kind (str) – The kind of tissue image to retrieve. Default is “uniprot_filtered”. Valid options are “complete”, “qc_filtered”, and “uniprot_filtered”.

  • preprocess (bool) – If True, preprocess the images (normalize). Default is True.

  • image_mode (str) – The desired image mode of the returned tissue images. Valid options are “CHW” and “HWC”. Default is “CHW”.

Returns:

A ComposedTissue instance containing the tissue ID and a dictionary of modality-specific Tissue instances.

Return type:

ComposedTissue

get_composed_tile(tissue_id, tile_id, kind='uniprot_filtered', preprocess=True, image_mode='CHW')[source]

Get a composed tile for a given tissue ID and tile ID, which includes all available modalities for that tissue.

Parameters:
  • tissue_id (str) – The tissue ID to retrieve.

  • tile_id (int) – The tile ID to retrieve.

  • kind (str) – The kind of tile image to retrieve. Default is “uniprot_filtered”. Valid options are “complete”, “qc_filtered”, and “uniprot_filtered”.

  • preprocess (bool) – If True, preprocess the images (normalize). Default is True.

  • image_mode (str) – The desired image mode of the returned tile images. Valid options are “CHW” and “HWC”. Default is “CHW”.

Returns:

A ComposedTissue instance containing the tissue ID and a dictionary of modality-specific tile images.

Return type:

ComposedTissue

get_composed_tissue_by_patient(patient_id, kind='uniprot_filtered', preprocess=True, image_mode='CHW')[source]

Get composed tissue samples for all tissues associated with a given patient ID.

Parameters:
  • patient_id (str) – The patient ID to retrieve tissues for.

  • kind (str) – The kind of tissue image to retrieve. Default is “uniprot_filtered”. Valid options are “complete”, “qc_filtered”, and “uniprot_filtered”.

  • preprocess (bool) – If True, preprocess the images (normalize). Default is True.

  • image_mode (str) – The desired image mode of the returned tissue images. Valid options are “CHW” and “HWC”. Default is “CHW”.

Returns:

A list of ComposedTissue instances for each tissue associated with the patient.

Return type:

Sequence[ComposedTissue]

SporaDataset (Multi-cohort)

class spora_io.datasets.spora.SporaDataset(dataset_names, *, datasets_dir=None, modalities=None, resolution=1.0, tile_size=None, tile_strategy='default', sampling_unit=None, verbose=True, load_cell_metadata=False, split=None, modality_kwargs=None, dataset_modality_kwargs=None, seed=None)[source]

Bases: object

Dataset-of-datasets wrapper for sampling tissues or tiles across cohorts.

SporaDataset instantiates one ComposedImagingDataset per dataset name, then builds either a tissue index or a concatenated tile-coordinate index. Samples are returned with a dataset name, tissue id, optional tile id, and a modality-to-tissue/tile mapping.

Parameters:
  • dataset_names (str | Iterable[str])

  • datasets_dir (str | Path | None)

  • modalities (str | Iterable[str] | None)

  • resolution (float | str)

  • tile_size (int | None)

  • tile_strategy (str)

  • sampling_unit (SamplingUnit | None)

  • verbose (bool)

  • load_cell_metadata (bool)

  • split (str | None)

  • modality_kwargs (Mapping[str, Mapping[str, Any]] | None)

  • dataset_modality_kwargs (Mapping[str, Mapping[str, Mapping[str, Any]]] | None)

  • seed (int | None)