spora-io
spora_io is the data-loading layer for the current SPORA spatial
proteomics dataset format. It turns curated on-disk datasets into typed Python
objects for model training, visualization, and benchmarking.
The library is built around a few explicit contracts:
Modality-specific loaders for H&E, single-marker IHC, and multiplex proteomics images such as IMC, CODEX, CyCIF, and MIBI.
Composed multimodal loading for tissues that have aligned views across multiple modalities.
Multi-cohort sampling through
SporaDataset, which builds a global tissue or tile index across several dataset folders.Shared segmentation and tiling from
segmentations/<resolution>/...andtiling/<resolution>/<strategy>/....Stats-backed multiplex standardization from parquet files under
<modality>/<resolution>/standardization/<spec>/....
Minimal Example
from spora_io import SporaDataset
dataset = SporaDataset(
["schurch2020coordinated", "lin2022multiplexed"],
modalities=["codex", "imc"],
resolution=1.0,
tile_size=224,
sampling_unit="tiles",
split="train",
modality_kwargs={
"codex": {"standardization": "quantile_clipping/uq_0.99_image"},
"imc": {"standardization": "quantile_clipping/uq_0.99_image"},
},
)
sample = dataset.sample_random_tile()
sample["dataset_name"]
sample["tissue_id"]
sample["tile_id"]
sample["modalities"]
When To Use Which Class
Class |
Use case |
|---|---|
|
Load one H&E modality from one dataset. |
|
Load one marker-specific IHC modality from one dataset. |
|
Load one multiplex modality with channel metadata and standardization. |
|
Load multiple modalities from one dataset using shared tissue IDs. |
|
Sample tissues or tiles across multiple datasets. |
Contents
Getting Started
API Reference