Dataset Classes
These classes define the main loading interfaces for the current on-disk dataset format.
Base
- class spora_io.datasets.base.BaseImagingDataset(name, modality, resolution, path=None, tile_size=None, load_cell_metadata=False, verbose=True, label=None, labels_to_keep=None, label_modifying_fn=None, label_type='classification', tile_strategy=None, split=None)[source]
Bases:
ABCBase class for all imaging datasets.
- Parameters:
name (str)
modality (ModKey)
resolution (float | str)
path (os.PathLike | str | None)
tile_size (Optional[int])
load_cell_metadata (bool)
verbose (bool)
label (Optional[str])
labels_to_keep (Optional[Sequence[str]])
label_modifying_fn (Optional[Callable])
label_type (str)
tile_strategy (Optional[str])
split (Optional[str])
- name
The name of the dataset.
- Type:
str
- path
The root path to the dataset. If omitted, this resolves to
SPORA_DATASETS_DIR / name.- Type:
Path
- modality
The modality of the dataset.
- Type:
ModKey
- resolution
The resolution of the dataset in mpp, formatted as a string with underscores instead of decimals (e.g. “0_5mpp”).
- Type:
str
- tile_size
The tile size in pixels. If None, tiling functionality will be disabled.
- Type:
Optional[int]
- tile_strategy
The tiling strategy used for the dataset. If None, defaults to “default”. This is used to determine the subdirectory under tiling/<resolution>/ where tile coordinates are stored.
- Type:
Optional[str]
- label
The name of the label column in the tissue metadata. If None, no labels will be loaded.
- Type:
Optional[str]
- labels_to_keep
The list of label values to keep if label is not None. If None, all labels will be kept.
- Type:
Optional[Sequence[str]]
- label_modifying_fn
A function to modify the labels after loading. For example, this can be used to binarize labels or group certain labels together. If None, labels will not be modified.
- Type:
Optional[Callable]
- label_type
The type of the label, either “classification” or “regression”. This is used to determine how to encode the labels. Default is “classification”.
- Type:
str
- get_tissue_ids(kind='modality')[source]
Get the unique tissue IDs from the tissue annotations.
- Parameters:
kind (str) – The kind of tissue IDs to retrieve. Default is “modality”, which returns tissue ids for tissues that have the specified modality. If “all”, returns tissue ids for all tissues in the dataset.
- Returns:
An array of unique tissue IDs.
- Return type:
ndarray[tuple[Any,...],dtype[str_]]
- abstractmethod get_tissue(tissue_id, kind='complete', preprocess=True, image_mode='CHW')[source]
Get a tissue image by tissue ID.
Subclasses define the valid
kindvalues and preprocessing behavior.- Return type:
- Parameters:
tissue_id (str)
preprocess (bool)
image_mode (str)
- get_tissue_mask(tissue_id)[source]
Get the tissue mask for a given tissue id. If the tissue masks directory does not exist, return a full mask.
- Parameters:
tissue_id (
str) – The tissue ID to retrieve the mask for.
- Returns:
The tissue mask as a TissueMask instance.
- Return type:
- abstractmethod get_tile(tissue_id, tile_id, kind='complete', image_mode='CHW', preprocess=True)[source]
Get a specific tile based on the tissue id and tile id. Tile IDs are precomputed during tiling. If tile coordinates are not available, this method should return a random tile from the tissue image. Subclasses that implement tiling should override this method to load the tile based on the precomputed tile coordinates.
- Parameters:
tissue_id (
str) – The tissue ID to retrieve the tile for.tile_id (
int) – The tile ID to retrieve.kind (
str) – The kind of tile image to retrieve. Default is “complete”.image_mode (
str) – The image mode of the tile image. Valid options are “CHW” and “HWC”. Default is “CHW”.preprocess (
bool) – Whether to preprocess the tile image (e.g. normalize) before returning it. Default is True.
- Returns:
The tile image as a Tissue instance.
- Return type:
- abstractmethod get_tile_by_coordinates(tissue_id, row, col, kind='complete', image_mode='CHW', preprocess=True)[source]
Get a specific tile based on the tissue id and tile coordinates. This method is used when tile coordinates are not precomputed and get_tile should return a random tile.
- Parameters:
tissue_id (
str) – The tissue ID to retrieve the tile for.row (
int) – The row coordinate of the tile.col (
int) – The column coordinate of the tile.kind (
str) – The kind of tile image to retrieve. Default is “complete”.image_mode (
str) – The image mode of the tile image. Valid options are “CHW” and “HWC”. Default is “CHW”.preprocess (
bool) – Whether to preprocess the tile image (e.g. normalize) before returning it. Default is True.
- Returns:
The tile image as a Tissue instance.
- Return type:
- get_cell_instance_mask(tissue_id, use_instances_from_virtues=False)[source]
Get the cell instance mask for a given tissue id.
- Parameters:
tissue_id (
str) – The tissue ID to retrieve the cell instance mask for.use_instances_from_virtues (
bool) – Whether to use instances generated using the VirTues foundation model.
- Returns:
The cell instance mask as a CellMask instance.
- Return type:
- get_cell_task_mask(tissue_id, mask_type)[source]
Get the cell task mask for a given tissue id and mask type.
- Parameters:
tissue_id (
str) – The tissue ID to retrieve the cell task mask for.mask_type (
str) – The type of cell task mask to retrieve. Valid options can be retrieved from get_cell_task_mask_types method.
- Returns:
The cell task mask as a CellMask instance.
- Return type:
- get_cell_task_mask_types()[source]
Get the available cell task mask types for the dataset. :returns: A list of available cell task mask types.
- Return type:
Sequence[str]
- get_tissue_by_patient(patient_id, kind='complete', preprocess=True, image_mode='CHW')[source]
Get all tissue images associated with a patient ID.
- Return type:
Sequence[HETissue|MultiplexTissue|IHCTissue]- Parameters:
patient_id (str)
preprocess (bool)
image_mode (str)
H&E
- class spora_io.datasets.he.HEImagingDataset(name, resolution, path=None, load_cell_metadata=False, verbose=True, mean_std_type='imagenet', tile_size=None, tile_strategy=None, split=None, **kwargs)[source]
Bases:
BaseImagingDatasetClass for handling H&E stained imaging datasets.
- Parameters:
name (str)
resolution (float | str)
path (os.PathLike | str | None)
load_cell_metadata (bool)
verbose (bool)
mean_std_type (str)
tile_size (Optional[int])
tile_strategy (Optional[str])
split (Optional[str])
- IMAGENET_MEAN
The mean values for ImageNet normalization.
- Type:
torch.Tensor
- IMAGENET_STD
The standard deviation values for ImageNet normalization.
- Type:
torch.Tensor
- HIBOU_MEAN
The mean values for HIBOU normalization.
- Type:
torch.Tensor
- HIBOU_STD
The standard deviation values for HIBOU normalization.
- Type:
torch.Tensor
- mean_std_type
The type of mean and standard deviation to use for normalization. Valid options are “imagenet” and “hibou”.
- Type:
str
- get_tissue(tissue_id, kind='complete', preprocess=True, image_mode='CHW')[source]
Get the normalized tissue image post filtering channels for a given tissue id.
- Parameters:
tissue_id (
str) – The tissue ID to retrieve the image for.kind (
str) – The kind of tissue image to retrieve. Only “complete” is supported for H&E datasets since there is only one modality channel. Default is “complete”.preprocess (
bool) – If True, preprocess the image (normalize). Default is True.
- Returns:
The normalized tissue image as an HETissue instance.
- Return type:
- get_tile_by_coordinates(tissue_id, row, col, kind='complete', image_mode='CHW', preprocess=True)[source]
Get a specific tile based on the tissue id and tile id
- Parameters:
tissue_id (
str) – The tissue ID to retrieve the tile for.row (
int) – The row coordinate of the tile.col (
int) – The column coordinate of the tile.kind (
str) – The kind of tile image to retrieve. Default is “complete”.image_mode (
str) – The image mode of the tile image. Valid options are “CHW” and “HWC”. Default is “CHW”.preprocess (
bool) – If True, preprocess the image (normalize). Default is True.
- Returns:
The specific tile as an HETissue instance.
- Return type:
- get_tile(tissue_id, tile_id, kind='complete', image_mode='CHW', preprocess=True)[source]
Get a specific tile based on the tissue id and tile id
- Parameters:
tissue_id (
str) – The tissue ID to retrieve the tile for.tile_id (
int) – The tile ID to retrieve.kind (str)
image_mode (str)
preprocess (bool)
- Returns:
The specific tile as an HETissue instance.
- Return type:
Multiplex (IMC, CODEX, CycIF, etc.)
- class spora_io.datasets.multiplex.MultiplexImagingDataset(name, modality, standardization, resolution, path=None, tile_size=None, verbose=True, load_cell_metadata=False, disable_quantile_mask=False, filter_list=None, use_mean_std=True, return_uniprot_ids=True, replace_nuclear_uniprot_ids=False, tile_strategy=None, split=None, **kwargs)[source]
Bases:
BaseImagingDatasetClass for handling multiplex imaging datasets.
- Parameters:
name (str)
modality (str)
standardization (str)
resolution (float | str)
path (os.PathLike | str | None)
tile_size (Optional[int])
verbose (bool)
load_cell_metadata (bool)
disable_quantile_mask (bool)
filter_list (List[str] | None)
use_mean_std (bool)
return_uniprot_ids (bool)
replace_nuclear_uniprot_ids (bool)
tile_strategy (Optional[str])
split (Optional[str])
- VALID_MODALITIES
A set of valid modalities for multiplex imaging datasets. Valid options are “imc”, “codex”, and “cycif”.
- Type:
set
- standardization
The type of standardization to apply to the images. This is passed to the build_standardizer function to create a standardizer instance.
- Type:
str
- disable_quantile_mask
Quantile masking will search for channels with 0 quantile / variance and exclude them from standardization. Setting this to True will disable this behavior and include all channels in standardization. This is passed to the build_standardizer function.
- Type:
bool
- filter_list
A list of filter names to apply to the dataset. Currently supported filters are “gaussian_blur” and “median_filter”. These are applied sequentially, and the parameters for each filter can be specified in the filter_params dictionary in kwargs, with the filter name as the key and the parameters as a dictionary of parameters for that filter.
- Type:
List[str]
- use_mean_std
Whether to use mean and standard deviation for standardization. If False, only quantile normalization will be applied if not disabled. This is passed to the build_standardizer function.
- Type:
bool
- return_uniprot_ids
Whether to return uniprot IDs for the channels. This requires a “uniprot_id” column in the channels.parquet file. If this column is not present, uniprot IDs will not be returned regardless of this setting.
- Type:
bool
- replace_nuclear_uniprot_ids
Whether to replace UniProt IDs for channels flagged with
is_nuclear_markerwith Histone H3’s UniProt ID (P68431). This lets nuclear markers participate inuniprot_filteredloading even when the source channel lacks a protein-specific UniProt ID.- Type:
bool
- get_channel_names(tissue_id, kind='complete', measured_mask=None, qc_mask=None, filtered_mask=None)[source]
Get the channel names for a given tissue id and kind.
- Parameters:
tissue_id (
str) – The tissue ID to retrieve the channel names for.kind (
str) – The kind of tissue image to retrieve channel names for. Valid options are “complete”, “qc_filtered”, and “uniprot_filtered”.measured_mask (np.ndarray | None) – A boolean array indicating which channels are measured for the given tissue. If None, it will be retrieved from the image_channel_map. This is used to determine which channels to consider when applying the kind filtering.
qc_mask (np.ndarray | None) – A boolean array indicating which channels pass quality control for the given tissue. If None, it will be computed from the quality_control_mask and measured_mask. This is used when kind is “qc_filtered” or “uniprot_filtered” to determine which channels to include.
filtered_mask (np.ndarray | None) – A boolean array indicating which channels have valid UniProt IDs for the given tissue. If None, it will be computed from the uniprot_mask and qc_mask. This is used when kind is “uniprot_filtered” to determine which channels to include.
- Returns:
The channel names as a 1D array of shape (n_channels,).
- Return type:
ndarray[tuple[Any,...],dtype[str_]]
- get_uniprot_ids(tissue_id, kind='complete', measured_mask=None, qc_mask=None, filtered_mask=None)[source]
Get the uniprot IDs for a given tissue id and kind. This will return None if return_uniprot_ids is False or if there are no valid uniprot IDs for the given kind.
- Parameters:
tissue_id (
str) – The tissue ID to retrieve the uniprot IDs for.kind (
str) – The kind of tissue image to retrieve uniprot IDs for. Valid options are “complete”, “qc_filtered”, and “uniprot_filtered”.measured_mask (np.ndarray | None) – A boolean array indicating which channels are measured for the given tissue. If None, it will be retrieved from the image_channel_map. This is used to determine which channels to consider when applying the kind filtering.
qc_mask (np.ndarray | None) – A boolean array indicating which channels pass quality control for the given tissue. If None, it will be computed from the quality_control_mask and measured_mask. This is used when kind is “qc_filtered” or “uniprot_filtered” to determine which channels to include.
filtered_mask (np.ndarray | None) – A boolean array indicating which channels have valid UniProt IDs for the given tissue. If None, it will be computed from the uniprot_mask and qc_mask. This is used when kind is “uniprot_filtered” to determine which channels to include.
- Returns:
The UniProt IDs as a 1D array of shape (n_channels,). Returns None if
return_uniprot_idsis False.- Return type:
ndarray[tuple[Any,...],dtype[object_]] |None
- get_tissue(tissue_id, kind='uniprot_filtered', preprocess=True, image_mode='CHW')[source]
Get the tissue image for a given tissue id, with options for filtering channels and preprocessing.
- Parameters:
tissue_id (
str) – The tissue ID to retrieve the image for.kind (str) – The kind of tissue image to retrieve. Valid options are “complete”, “qc_filtered”, and “uniprot_filtered”. Default is “uniprot_filtered”.
preprocess (bool) – Whether to preprocess the image using the standardizer. Default is True.
image_mode (str) – The image mode of the tissue image. Valid options are “CHW” and “HWC”. Default is “CHW”.
- Returns:
The tissue image as a MultiplexTissue instance.
- Return type:
- get_tile_by_coordinates(tissue_id, row, col, kind='uniprot_filtered', image_mode='CHW', preprocess=True)[source]
Get a specific tile based on the tissue id and tile row and column coordinates.
- Parameters:
tissue_id (
str) – The tissue ID to retrieve the tile for.row (
int) – The row coordinate of the top-left corner of the tile.col (
int) – The column coordinate of the top-left corner of the tile.preprocess (
bool) – Whether to preprocess the tile using the standardizer. Default is True.kind (
str) – The kind of tissue image to retrieve for the tile. Valid options are “complete”, “qc_filtered”, and “uniprot_filtered”. Default is “uniprot_filtered”.image_mode (
str) – The image mode of the tissue image. Valid options are “CHW” and “HWC”. Default is “CHW”.
- Returns:
The specific tile as a MultiplexTissue instance.
- Return type:
- get_tile(tissue_id, tile_id, kind='uniprot_filtered', image_mode='CHW', preprocess=True)[source]
Get a specific tile based on the tissue id and tile id
- Parameters:
tissue_id (
str) – The tissue ID to retrieve the tile for.tile_id (
int) – The tile ID to retrieve.preprocess (bool) – Whether to preprocess the tile using the standardizer. Default is True.
kind (str) – The kind of tissue image to retrieve for the tile. Valid options are “complete”, “qc_filtered”, and “uniprot_filtered”. Default is “uniprot_filtered”.
image_mode (str) – The image mode of the tissue image. Valid options are “CHW” and “HWC”. Default is “CHW”.
- Returns:
The specific tile as an MultiplexTissue instance.
- Return type:
IHC
- class spora_io.datasets.ihc.SingleIHCImagingDataset(name, marker_name, resolution, tile_size, path=None, load_cell_metadata=False, verbose=True, mean_std_type='imagenet', tile_strategy=None, split=None, **kwargs)[source]
Bases:
BaseImagingDatasetClass for handling IHC stained imaging datasets.
- Parameters:
name (str)
marker_name (str)
resolution (float | str)
tile_size (int)
path (os.PathLike | str | None)
load_cell_metadata (bool)
verbose (bool)
mean_std_type (str)
tile_strategy (Optional[str])
split (Optional[str])
- IMAGENET_MEAN
The mean values for ImageNet normalization.
- Type:
torch.Tensor
- IMAGENET_STD
The standard deviation values for ImageNet normalization.
- Type:
torch.Tensor
- HIBOU_MEAN
The mean values for HIBOU normalization.
- Type:
torch.Tensor
- HIBOU_STD
The standard deviation values for HIBOU normalization.
- Type:
torch.Tensor
- mean_std_type
The type of mean and standard deviation to use for normalization. Valid options are “imagenet” and “hibou”.
- Type:
str
- get_tissue(tissue_id, kind='complete', preprocess=True, image_mode='CHW')[source]
Get the normalized tissue image post filtering channels for a given tissue id.
- Parameters:
tissue_id (
str) – The tissue ID to retrieve the image for.kind (
str:param For H&E datasets: :param “qc_filtered” and “filtered” will return the same image since there is only one modality channel.:) – The kind of tissue image to retrieve. Default is “complete”. Valid options are “complete”, “qc_filtered”, and “filtered”.preprocess (
bool) – If True, preprocess the image (standardize). Default is True.image_mode (str)
- Returns:
The normalized tissue image as an IHCTissue instance.
- Return type:
- get_tile_by_coordinates(tissue_id, row, col, kind='complete', image_mode='CHW', preprocess=True)[source]
Get a specific tile based on the tissue id and tile coordinates.
- Parameters:
tissue_id (
str) – The tissue ID to retrieve the tile for.row (
int) – The row coordinate of the tile.col (
int) – The column coordinate of the tile.kind (
str) – The kind of tile image to retrieve. Only “complete” is supported for IHC datasets.image_mode (
str) – The image mode of the tile image. Valid options are “CHW” and “HWC”.preprocess (
bool) – Whether to preprocess the tile image before returning it.
- Returns:
The specific tile as an IHCTissue instance.
- Return type:
- get_tile(tissue_id, tile_id, kind='complete', image_mode='CHW', preprocess=True)[source]
Get a specific tile based on the tissue id and tile id
- Parameters:
tissue_id (
str) – The tissue ID to retrieve the tile for.kind (
str) – The kind of tile image to retrieve. Only “complete” is supported for IHC datasets since there is only one modality channel. Default is “complete”.image_mode (
str) – The image mode of the tile image. Valid options are “CHW” and “HWC”. Default is “CHW”.preprocess (
bool) – Whether to preprocess the tile image (e.g. normalize) before returning it. Default is True.tile_id (
int) – The tile ID to retrieve.
- Returns:
The specific tile as an IHCTissue instance.
- Return type:
Composed (Multi-modal)
- class spora_io.datasets.compose.ComposedImagingDataset(name, modalities, tile_size, resolution, path=None, verbose=True, load_cell_metadata=False, tile_strategy=None, split=None, *, modality_kwargs=None)[source]
Bases:
objectCompose multiple unimodal datasets (HE, Multiplex, etc.) into a single handle.
Uniform interface to fetch tissues/tiles per modality.
Ensures consistent tile strategy across modalities by construction.
Extensible via modality_kwargs to pass per-modality constructor arguments.
Notes on behavior: - get_tissue_ids() returns the union of tissue IDs across all instantiated modalities. - get_modalities_of_tissue(tissue_id) lists which modalities contain that tissue ID. - Marker-specific helpers (indices/metadata) are forwarded to each unimodal dataset when available.
- Parameters:
name (str)
modalities (Iterable[ModKey])
tile_size (int)
resolution (float | str)
path (Union[str, Path] | None)
verbose (bool)
load_cell_metadata (bool)
tile_strategy (Optional[str])
split (Optional[str])
modality_kwargs (Optional[Mapping[str, Mapping[str, Any]]])
- get_dataset(modality)[source]
Get the unimodal dataset instance for a given modality key.
- Parameters:
modality (
Union[str,HEModality,IHCModality,IMCModality,CODEXModality,CycIFModality,MIBIModality]) – The modality key (string or object with ‘name’ attribute) to retrieve the dataset for.
- Returns:
The unimodal dataset instance corresponding to the modality key.
- Return type:
Any- Raises:
KeyError – If the modality key is not part of this composed dataset.
- get_available_modalities()[source]
Get the list of available modalities in this composed dataset. :returns: A list of modality keys representing the available modalities.
- Return type:
List[str]
- get_tissue_ids(modality=None)[source]
Get the list of tissue IDs available in the dataset. If a modality is specified, return only tissue IDs for that modality.
- Parameters:
modality (
Union[str,HEModality,IHCModality,IMCModality,CODEXModality,CycIFModality,MIBIModality,None]) – The modality key to filter tissue IDs by. If None, returns tissue IDs across all modalities.
- Returns:
A list of tissue IDs available in the dataset (filtered by modality if specified).
- Return type:
List[str]
- get_modalities_of_tissue(tissue_id)[source]
Get the list of modalities available for a given tissue ID.
- Parameters:
tissue_id (
str) – The tissue ID to query modalities for.
- Returns:
A list of modality keys representing the modalities available for the given tissue ID.
- Return type:
List[str]
- get_unimodal_tissue(tissue_id, modality, kind='uniprot_filtered', preprocess=True, image_mode='CHW')[source]
Get the tissue image for a given tissue ID and modality, with options for kind of image and preprocessing.
- Parameters:
tissue_id (
str) – The tissue ID to retrieve.modality (
Union[str,HEModality,IHCModality,IMCModality,CODEXModality,CycIFModality,MIBIModality]) – The modality key to specify which unimodal dataset to query.kind (
str) – The kind of tissue image to retrieve. Default is “uniprot_filtered”. Valid options are “complete”, “qc_filtered”, and “uniprot_filtered”.preprocess (
bool) – If True, preprocess the image (normalize). Default is True.image_mode (str) – The desired image mode of the returned tissue image. Valid options are “CHW” and “HWC”. Default is “CHW”.
- Returns:
The tissue image as returned by the unimodal dataset’s get_tissue method.
- Return type:
Tissue
- get_unimodal_tissue_mask(tissue_id, modality)[source]
Get the quality control mask for a given tissue ID and modality.
- Parameters:
tissue_id (
str) – The tissue ID to retrieve the mask for.modality (
Union[str,HEModality,IHCModality,IMCModality,CODEXModality,CycIFModality,MIBIModality]) – The modality key to specify which unimodal dataset to query.
- Returns:
The quality control mask as returned by the unimodal dataset’s get_tissue_mask method.
- Return type:
np.ndarray
- get_unimodal_tissue_size(tissue_id, modality)[source]
Get the tissue size (C,H,W) for a given tissue ID and modality.
- Parameters:
tissue_id (
str) – The tissue ID to retrieve the size for.modality (
Union[str,HEModality,IHCModality,IMCModality,CODEXModality,CycIFModality,MIBIModality]) – The modality key to specify which unimodal dataset to query.
- Returns:
The tissue size (C,H,W) as returned by the unimodal dataset’s _get_tissue_size method.
- Return type:
Tuple[int,int,int]
- get_unimodal_tile(tissue_id, tile_id, modality, kind='uniprot_filtered', preprocess=True, image_mode='CHW')[source]
Get a specific tile for a given tissue ID and modality.
- Parameters:
tissue_id (
str) – The tissue ID to retrieve the tile for.tile_id (
int) – The tile ID to retrieve.modality (
Union[str,HEModality,IHCModality,IMCModality,CODEXModality,CycIFModality,MIBIModality]) – The modality key to specify which unimodal dataset to query.kind (
str) – The kind of image to retrieve. Valid options depend on modality.preprocess (
bool) – If True, preprocess the tile before returning.image_mode (
str) – The returned image layout, usually “CHW” or “HWC”.
- Returns:
The tile image as returned by the unimodal dataset’s get_tile method.
- Return type:
Tissue
- get_composed_tissue(tissue_id, kind='uniprot_filtered', preprocess=True, image_mode='CHW')[source]
Get a composed tissue sample for a given tissue ID, which includes all available modalities for that tissue.
- Parameters:
tissue_id (
str) – The tissue ID to retrieve.kind (
str) – The kind of tissue image to retrieve. Default is “uniprot_filtered”. Valid options are “complete”, “qc_filtered”, and “uniprot_filtered”.preprocess (
bool) – If True, preprocess the images (normalize). Default is True.image_mode (str) – The desired image mode of the returned tissue images. Valid options are “CHW” and “HWC”. Default is “CHW”.
- Returns:
A ComposedTissue instance containing the tissue ID and a dictionary of modality-specific Tissue instances.
- Return type:
- get_composed_tile(tissue_id, tile_id, kind='uniprot_filtered', preprocess=True, image_mode='CHW')[source]
Get a composed tile for a given tissue ID and tile ID, which includes all available modalities for that tissue.
- Parameters:
tissue_id (
str) – The tissue ID to retrieve.tile_id (
int) – The tile ID to retrieve.kind (
str) – The kind of tile image to retrieve. Default is “uniprot_filtered”. Valid options are “complete”, “qc_filtered”, and “uniprot_filtered”.preprocess (
bool) – If True, preprocess the images (normalize). Default is True.image_mode (str) – The desired image mode of the returned tile images. Valid options are “CHW” and “HWC”. Default is “CHW”.
- Returns:
A ComposedTissue instance containing the tissue ID and a dictionary of modality-specific tile images.
- Return type:
- get_composed_tissue_by_patient(patient_id, kind='uniprot_filtered', preprocess=True, image_mode='CHW')[source]
Get composed tissue samples for all tissues associated with a given patient ID.
- Parameters:
patient_id (
str) – The patient ID to retrieve tissues for.kind (
str) – The kind of tissue image to retrieve. Default is “uniprot_filtered”. Valid options are “complete”, “qc_filtered”, and “uniprot_filtered”.preprocess (
bool) – If True, preprocess the images (normalize). Default is True.image_mode (str) – The desired image mode of the returned tissue images. Valid options are “CHW” and “HWC”. Default is “CHW”.
- Returns:
A list of ComposedTissue instances for each tissue associated with the patient.
- Return type:
Sequence[ComposedTissue]
SporaDataset (Multi-cohort)
- class spora_io.datasets.spora.SporaDataset(dataset_names, *, datasets_dir=None, modalities=None, resolution=1.0, tile_size=None, tile_strategy='default', sampling_unit=None, verbose=True, load_cell_metadata=False, split=None, modality_kwargs=None, dataset_modality_kwargs=None, seed=None)[source]
Bases:
objectDataset-of-datasets wrapper for sampling tissues or tiles across cohorts.
SporaDataset instantiates one
ComposedImagingDatasetper dataset name, then builds either a tissue index or a concatenated tile-coordinate index. Samples are returned with a dataset name, tissue id, optional tile id, and a modality-to-tissue/tile mapping.- Parameters:
dataset_names (str | Iterable[str])
datasets_dir (str | Path | None)
modalities (str | Iterable[str] | None)
resolution (float | str)
tile_size (int | None)
tile_strategy (str)
sampling_unit (SamplingUnit | None)
verbose (bool)
load_cell_metadata (bool)
split (str | None)
modality_kwargs (Mapping[str, Mapping[str, Any]] | None)
dataset_modality_kwargs (Mapping[str, Mapping[str, Mapping[str, Any]]] | None)
seed (int | None)