Pre Module API Reference
Module for pre-processing to generate LandscapeFiles from ST data.
add_clustering_from_adata(adata, path_dega_files, cluster_key='leiden', segmentation_name=None)
Add cell clustering data from an AnnData object to LandscapeFiles.
This function exports clustering assignments and associated colors from an AnnData object to the LandscapeFiles format, enabling the Landscape and Yearbook widgets to use custom clustering results.
Parameters
adata : AnnData
AnnData object containing clustering results in obs[cluster_key].
Colors can be provided in uns[f"{cluster_key}_colors"].
path_dega_files : str or Path
Path to the LandscapeFiles directory.
cluster_key : str, default "leiden"
Column name in adata.obs containing cluster assignments.
segmentation_name : str, optional
Name for this segmentation/clustering result. If provided, files will be
saved as cell_clusters_{segmentation_name}/. If None, files are saved
to the default cell_clusters/ directory.
Returns
None
Examples
import scanpy as sc import celldega as dega
Load and cluster your data
adata = sc.read_h5ad("my_data.h5ad") sc.tl.leiden(adata, resolution=0.5)
Add clustering to LandscapeFiles
dega.pre.add_clustering_from_adata( ... adata, ... path_dega_files="./my_landscape_files", ... cluster_key="leiden" ... )
For a custom segmentation with a specific name
dega.pre.add_clustering_from_adata( ... adata, ... path_dega_files="./my_landscape_files", ... cluster_key="leiden", ... segmentation_name="cellpose2" ... )
Notes
The Landscape widget can use the custom clustering by setting the
segmentation parameter to match the segmentation_name.
add_custom_segmentation(technology, path_dega_files, path_segmentation_files, image_scale=1, tile_size=250)
Add custom segmentation to existing landscape files.
Parameters: - technology: Technology type (e.g., "Xenium", "MERSCOPE", "custom") - path_dega_files: Path to landscape files - path_segmentation_files: Path to segmentation files - image_scale: Image scale factor - tile_size: Tile size for processing
cluster_gene_expression(technology, path_dega_files, cbg, data_dir=None, segmentation_approach='default')
Calculates cluster-specific gene expression signatures for Xenium data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
technology
|
str
|
The technology used (e.g., "Xenium" or "MERSCOPE"). Currently, only "Xenium" is supported. |
required |
data_dir
|
str
|
Path to the directory containing the Xenium data. |
None
|
path_dega_files
|
str
|
Path to the directory where the gene expression signature file will be saved. |
required |
cbg
|
DataFrame
|
A cell-by-gene matrix where rows represent cells and columns represent genes. The index of the DataFrame should match the cell IDs in the Xenium metadata. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the specified technology is not supported. |
FileNotFoundError
|
If the required input files are not found. |
create_cluster_and_meta_cluster(technology, path_dega_files, data_dir=None, segmentation_approach='default')
Creates cell clusters and meta cluster files for visualization. Currently supports only Xenium.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
technology
|
str
|
The technology used (e.g., "Xenium" or "MERSCOPE"). Currently, only "Xenium" is supported. |
required |
data_dir
|
str
|
Path to the directory containing the Xenium data. |
None
|
path_dega_files
|
str
|
Path to the directory where the cluster and meta cluster files will be saved. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the specified technology is not supported. |
FileNotFoundError
|
If the required input files are not found. |
create_image_tiles(technology, data_dir, path_dega_files, image_tile_layer='dapi')
Creates image tiles for visualization from the Xenium morphology image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
technology
|
str
|
The technology used (e.g., "Xenium", "MERSCOPE", "VisiumHD", "H&E"). |
required |
data_dir
|
str
|
Path to the directory containing the data (e.g., morphology_focus_0000.ome.tif). |
required |
path_dega_files
|
str
|
Path to the directory where the image tiles and pyramid will be saved. |
required |
image_tile_layer
|
str
|
Specifies which image layers to process. Options for Xenium are |
'dapi'
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If the specified technology is not supported or if the image_tile_layer is invalid. |
FileNotFoundError
|
If the required input image file is not found. |
create_image_tiles_h_and_e(data_dir, path_dega_files, image_tile_layer)
Creates image tiles for visualization from the H&E image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_dir
|
str
|
Path to the directory containing the data (e.g., morphology_focus_0000.ome.tif). |
required |
path_dega_files
|
str
|
Path to the directory where the image tiles and pyramid will be saved. |
required |
image_tile_layer
|
str
|
Specifies the name of the h&e image to process. |
required |
Raises: FileNotFoundError: If the required input image file is not found.
create_image_tiles_merscope(data_dir, path_dega_files, image_tile_layer='dapi')
Creates image tiles for visualization from the Xenium morphology image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_dir
|
str
|
Path to the directory containing the data (e.g., morphology_focus_0000.ome.tif). |
required |
path_dega_files
|
str
|
Path to the directory where the image tiles and pyramid will be saved. |
required |
image_tile_layer
|
str
|
Specifies which image layers to process. Options are 'dapi' (default) or 'all'. |
'dapi'
|
Raises: FileNotFoundError: If the required input image file is not found.
create_image_tiles_xenium(data_dir, path_dega_files, image_tile_layer='dapi')
Creates image tiles for visualization from the Xenium morphology image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_dir
|
str
|
Path to the directory containing the data (e.g., morphology_focus_0000.ome.tif). |
required |
path_dega_files
|
str
|
Path to the directory where the image tiles and pyramid will be saved. |
required |
image_tile_layer
|
str
|
Specifies which image layers to process. Options are 'dapi' (default) or 'all'. |
'dapi'
|
Raises: FileNotFoundError: If the required input image file is not found.
get_image_info(technology, image_tile_layer='dapi')
Retrieve image information for a given technology and image tile layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
technology
|
str
|
The technology for which image information is requested. Currently supports 'Xenium' and 'MERSCOPE'. |
required |
image_tile_layer
|
str
|
The type of image tile layer to retrieve information for. Options are 'dapi' or 'all'. Defaults to 'dapi'. |
'dapi'
|
Returns:
| Type | Description |
|---|---|
list[dict]
|
A list of dictionaries containing image information, including name, |
list[dict]
|
button name, and color. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the technology is not supported or the image_tile_layer is invalid. |
get_max_zoom_level(path_image_pyramid)
Returns the maximum zoom level based on the highest-numbered directory in the specified path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path_image_pyramid
|
str
|
Path to the directory containing zoom level directories. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
The maximum zoom level. |
make_chromium_from_anndata(adata, path_dega_files)
Generate minimal LandscapeFiles from a Chromium AnnData object.
Parameters
adata : anndata.AnnData AnnData object containing scRNA-seq count data. path_dega_files : str or Path Directory where LandscapeFiles will be written.
Raises
ValueError If the expression matrix contains non-integer values.
make_deepzoom_pyramid(image_path, output_path, pyramid_name, tile_size=512, overlap=0, suffix='.jpeg')
Creates a DeepZoom image pyramid from a JPEG image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_path
|
str
|
Path to the JPEG image file. |
required |
output_path
|
str
|
Directory to save the DeepZoom pyramid. |
required |
pyramid_name
|
str
|
Name of the pyramid directory. |
required |
tile_size
|
int
|
Tile size for the DeepZoom pyramid. Defaults to 512. |
512
|
overlap
|
int
|
Overlap size for the DeepZoom pyramid. Defaults to 0. |
0
|
suffix
|
str
|
Suffix for the DeepZoom pyramid tiles. Defaults to ".jpeg". |
'.jpeg'
|
Returns:
| Type | Description |
|---|---|
|
None |
make_meta_cell_image_coord(technology, path_transformation_matrix, path_meta_cell_micron, path_meta_cell_image, image_scale=1, sample=None, paths=None, dataset=None)
Applies an affine transformation to cell coordinates in microns and saves the transformed coordinates in pixels.
Parameters
technology : str The technology used to generate the data, Xenium and MERSCOPE are supported. path_transformation_matrix : str Path to the transformation matrix file path_meta_cell_micron : str Path to the meta cell file with coordinates in microns path_meta_cell_image : str Path to save the meta cell file with coordinates in pixels
Returns
None
Examples
make_meta_cell_image_coord( ... technology='Xenium', ... path_transformation_matrix='data/transformation_matrix.csv', ... path_meta_cell_micron='data/meta_cell_micron.csv', ... path_meta_cell_image='data/meta_cell_image.parquet' ... ) Args: technology (str): The technology used to generate the data (e.g., "Xenium" or "MERSCOPE"). path_transformation_matrix (str): Path to the transformation matrix file. path_meta_cell_micron (str): Path to the meta cell file with coordinates in microns. path_meta_cell_image (str): Path to save the meta cell file with coordinates in pixels. image_scale (float): Scaling factor to convert micron coordinates to pixel coordinates.
Returns:
| Type | Description |
|---|---|
|
None |
make_meta_gene(cbg, path_output)
Creates a DataFrame with genes and their assigned colors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cbg
|
DataFrame
|
A sparse DataFrame with genes as columns and barcodes as rows.. |
required |
path_output
|
str
|
Path to save the meta gene file. |
required |
Returns:
| Type | Description |
|---|---|
|
None |
make_trx_tiles(technology, path_trx, path_transformation_matrix=None, path_trx_tiles=None, coarse_tile_factor=10, tile_size=250, chunk_size=1000000, verbose=False, image_scale=1, max_workers=1, streaming_tile_assignment=None)
Processes transcript data by dividing it into coarse-grain and fine-grain tiles, applying transformations, and saving the results in a parallelized manner.
Parameters
technology : str
The technology used for generating the transcript data (e.g., "MERSCOPE" or "Xenium").
path_trx : str
Path to the file containing the transcript data.
path_transformation_matrix : str
Path to the file containing the transformation matrix (CSV file).
path_trx_tiles : str
Directory path where the output files (Parquet files) for each tile will be saved.
coarse_tile_factor : int, optional
Scaling factor of each coarse-grain tile comparing to the fine tile size.
tile_size : int, optional
Size of each fine-grain tile in microns (default is 250).
chunk_size : int, optional
Number of rows to process per chunk for memory efficiency (default is 1000000).
verbose : bool, optional
Flag to enable verbose output (default is False).
image_scale : float, optional
Scale factor to apply to the transcript coordinates (default is 1.0).
max_workers : int, optional
Maximum number of parallel workers for processing tiles (default is 1).
streaming_tile_assignment : bool or None, optional
If True, stream transformed coordinates to Parquet shards and spill per spatial tile
(same strategy as row-group mode) instead of concatenating all rows and using
partition_by / coarse filters on one huge frame. If None, enable automatically
when row count is at least STREAMING_TILE_ASSIGN_ROW_THRESHOLD.
Returns
dict A dictionary containing the bounds of the processed data in both x and y directions.
pack_image_tiles_to_parquet(pyramid_dir, channel_name, output_path, image_format='.webp', delete_source_tiles=True, max_row_groups_per_file=2000)
Pack all image tiles from a DeepZoom pyramid into chunked parquet files with row groups.
Each zoom level's tiles are stored as row groups, allowing efficient range-based access. The formula for row group index is: row_group_index = sum of tiles in previous zoom levels + tile_x * num_tiles_y + tile_y
For large datasets, tiles are split across multiple parquet files, each containing
at most max_row_groups_per_file row groups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pyramid_dir
|
str
|
Path to the pyramid_images directory. |
required |
channel_name
|
str
|
Name of the image channel (e.g., "dapi"). |
required |
output_path
|
str
|
Path to the output directory (will contain chunk_X.parquet files). |
required |
image_format
|
str
|
Image file extension (default ".webp"). |
'.webp'
|
delete_source_tiles
|
bool
|
If True, delete the original tile files after packing. |
True
|
max_row_groups_per_file
|
int
|
Maximum row groups per file (default 400). |
2000
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
Image tile metadata including grid info per zoom level and image dimensions. |
read_cbg_mtx(base_path, barcodes_name='barcodes', features_name='features', technology=None)
Read the cell-by-gene matrix from the mtx files.
Parameters
base_path : str The base path to the directory containing the mtx files.
Returns
cbg : pandas.DataFrame A sparse DataFrame with genes as columns and barcodes as rows.
remove_intermediate_files(path_dega_files)
Remove intermediate image files.
Parameters: - path_dega_files: Path to landscape files directory
resolve_xenium_morphology_ome_path(data_dir)
Locate the morphology OME-TIFF for Xenium-class bundles (including Atera WTA preview).
Standard Xenium output uses morphology_focus/morphology_focus_0000.ome.tif.
Some v4-compatible and Atera preview bundles use other names under morphology_focus/
or ship morphology.ome.tif at the bundle root.
Resolution order:
. morphology_focus/morphology_focus_0000.ome.tif (classic Xenium)
. First morphology_focus/morphology_focus_*.ome.tif (lexicographic sort)
. First morphology_focus/*.ome.tif if no morphology_focus_* match
. morphology.ome.tif at bundle root
Parameters
data_dir
Path to the outs directory (e.g. containing experiment.xenium).
Returns
Path
Path to an existing .ome.tif file.
Raises
FileNotFoundError If no supported morphology TIFF is found.
save_landscape_parameters(technology, path_dega_files, image_name='dapi_files', tile_size=1000, image_info=None, image_format='.webp', use_int_index=True, segmentation_approach='default', use_row_groups=False, tile_grid_info=None, image_tile_info=None, trx_chunk_info=None, cell_chunk_info=None, cbg_chunk_info=None)
Saves the landscape parameters to a JSON file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
technology
|
str
|
The technology used to generate the data. |
required |
path_dega_files
|
str
|
Path to the directory where landscape files are stored. |
required |
image_name
|
str
|
Name of the image directory. Defaults to "dapi_files". |
'dapi_files'
|
tile_size
|
int
|
Tile size for the image pyramid. Defaults to 1000. |
1000
|
image_info
|
dict
|
Additional image metadata. Defaults to None. |
None
|
image_format
|
str
|
Format of the image files. Defaults to ".webp". |
'.webp'
|
use_int_index
|
bool
|
Use integer name for cell_tile and trx_tile. |
True
|
use_row_groups
|
bool
|
If True, tiles are stored as row groups. Defaults to False. |
False
|
tile_grid_info
|
dict
|
Tile grid metadata when using row groups. |
None
|
image_tile_info
|
dict
|
Image tile metadata from pack_image_tiles_to_parquet. |
None
|
trx_chunk_info
|
dict
|
Chunk info for transcript parquet files. |
None
|
cell_chunk_info
|
dict
|
Chunk info for cell segmentation parquet files. |
None
|
cbg_chunk_info
|
dict
|
Chunk info for CBG parquet files. |
None
|
Returns:
| Type | Description |
|---|---|
|
None |
write_identity_transform(path_dega_files)
Write an identity transform matrix for IST data.
write_xenium_transform(data_dir, path_dega_files, transform_fname='micron_to_image_transform.csv')
Extracts the transformation matrix from the Xenium cells.zarr.zip file and saves it as a CSV file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_dir
|
str
|
Path to the directory containing the Xenium data (e.g., cells.zarr.zip). |
required |
path_dega_files
|
str
|
Path to the directory where the transformation matrix CSV will be saved. |
required |
transform_fname
|
str
|
Name of the output CSV file. Defaults to "micron_to_image_transform.csv". |
'micron_to_image_transform.csv'
|
Returns:
| Type | Description |
|---|---|
|
numpy.ndarray: The full transformation matrix extracted from the Xenium cells.zarr.zip file. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the cells.zarr.zip file does not exist in the specified |
KeyError
|
If the transformation matrix is not found in the Zarr file under the expected path. |
Exception
|
If an unexpected error occurs while processing the Zarr file. |