Skip to content

Pre Module API Reference

Module for pre-processing to generate LandscapeFiles from ST data.

add_clustering_from_adata(adata, path_dega_files, cluster_key='leiden', segmentation_name=None)

Add cell clustering data from an AnnData object to LandscapeFiles.

This function exports clustering assignments and associated colors from an AnnData object to the LandscapeFiles format, enabling the Landscape and Yearbook widgets to use custom clustering results.

Parameters

adata : AnnData AnnData object containing clustering results in obs[cluster_key]. Colors can be provided in uns[f"{cluster_key}_colors"]. path_dega_files : str or Path Path to the LandscapeFiles directory. cluster_key : str, default "leiden" Column name in adata.obs containing cluster assignments. segmentation_name : str, optional Name for this segmentation/clustering result. If provided, files will be saved as cell_clusters_{segmentation_name}/. If None, files are saved to the default cell_clusters/ directory.

Returns

None

Examples

import scanpy as sc import celldega as dega

Load and cluster your data

adata = sc.read_h5ad("my_data.h5ad") sc.tl.leiden(adata, resolution=0.5)

Add clustering to LandscapeFiles

dega.pre.add_clustering_from_adata( ... adata, ... path_dega_files="./my_landscape_files", ... cluster_key="leiden" ... )

For a custom segmentation with a specific name

dega.pre.add_clustering_from_adata( ... adata, ... path_dega_files="./my_landscape_files", ... cluster_key="leiden", ... segmentation_name="cellpose2" ... )

Notes

The Landscape widget can use the custom clustering by setting the segmentation parameter to match the segmentation_name.

add_custom_segmentation(technology, path_dega_files, path_segmentation_files, image_scale=1, tile_size=250)

Add custom segmentation to existing landscape files.

Parameters: - technology: Technology type (e.g., "Xenium", "MERSCOPE", "custom") - path_dega_files: Path to landscape files - path_segmentation_files: Path to segmentation files - image_scale: Image scale factor - tile_size: Tile size for processing

cluster_gene_expression(technology, path_dega_files, cbg, data_dir=None, segmentation_approach='default')

Calculates cluster-specific gene expression signatures for Xenium data.

Parameters:

Name Type Description Default
technology str

The technology used (e.g., "Xenium" or "MERSCOPE"). Currently, only "Xenium" is supported.

required
data_dir str

Path to the directory containing the Xenium data.

None
path_dega_files str

Path to the directory where the gene expression signature file will be saved.

required
cbg DataFrame

A cell-by-gene matrix where rows represent cells and columns represent genes. The index of the DataFrame should match the cell IDs in the Xenium metadata.

required

Raises:

Type Description
ValueError

If the specified technology is not supported.

FileNotFoundError

If the required input files are not found.

create_cluster_and_meta_cluster(technology, path_dega_files, data_dir=None, segmentation_approach='default')

Creates cell clusters and meta cluster files for visualization. Currently supports only Xenium.

Parameters:

Name Type Description Default
technology str

The technology used (e.g., "Xenium" or "MERSCOPE"). Currently, only "Xenium" is supported.

required
data_dir str

Path to the directory containing the Xenium data.

None
path_dega_files str

Path to the directory where the cluster and meta cluster files will be saved.

required

Raises:

Type Description
ValueError

If the specified technology is not supported.

FileNotFoundError

If the required input files are not found.

create_image_tiles(technology, data_dir, path_dega_files, image_tile_layer='dapi')

Creates image tiles for visualization from the Xenium morphology image.

Parameters:

Name Type Description Default
technology str

The technology used (e.g., "Xenium", "MERSCOPE", "VisiumHD", "H&E").

required
data_dir str

Path to the directory containing the data (e.g., morphology_focus_0000.ome.tif).

required
path_dega_files str

Path to the directory where the image tiles and pyramid will be saved.

required
image_tile_layer str

Specifies which image layers to process. Options for Xenium are

'dapi'

Raises:

Type Description
ValueError

If the specified technology is not supported or if the image_tile_layer is invalid.

FileNotFoundError

If the required input image file is not found.

create_image_tiles_h_and_e(data_dir, path_dega_files, image_tile_layer)

Creates image tiles for visualization from the H&E image.

Parameters:

Name Type Description Default
data_dir str

Path to the directory containing the data (e.g., morphology_focus_0000.ome.tif).

required
path_dega_files str

Path to the directory where the image tiles and pyramid will be saved.

required
image_tile_layer str

Specifies the name of the h&e image to process.

required

Raises: FileNotFoundError: If the required input image file is not found.

create_image_tiles_merscope(data_dir, path_dega_files, image_tile_layer='dapi')

Creates image tiles for visualization from the Xenium morphology image.

Parameters:

Name Type Description Default
data_dir str

Path to the directory containing the data (e.g., morphology_focus_0000.ome.tif).

required
path_dega_files str

Path to the directory where the image tiles and pyramid will be saved.

required
image_tile_layer str

Specifies which image layers to process. Options are 'dapi' (default) or 'all'.

'dapi'

Raises: FileNotFoundError: If the required input image file is not found.

create_image_tiles_xenium(data_dir, path_dega_files, image_tile_layer='dapi')

Creates image tiles for visualization from the Xenium morphology image.

Parameters:

Name Type Description Default
data_dir str

Path to the directory containing the data (e.g., morphology_focus_0000.ome.tif).

required
path_dega_files str

Path to the directory where the image tiles and pyramid will be saved.

required
image_tile_layer str

Specifies which image layers to process. Options are 'dapi' (default) or 'all'.

'dapi'

Raises: FileNotFoundError: If the required input image file is not found.

get_image_info(technology, image_tile_layer='dapi')

Retrieve image information for a given technology and image tile layer.

Parameters:

Name Type Description Default
technology str

The technology for which image information is requested. Currently supports 'Xenium' and 'MERSCOPE'.

required
image_tile_layer str

The type of image tile layer to retrieve information for. Options are 'dapi' or 'all'. Defaults to 'dapi'.

'dapi'

Returns:

Type Description
list[dict]

A list of dictionaries containing image information, including name,

list[dict]

button name, and color.

Raises:

Type Description
ValueError

If the technology is not supported or the image_tile_layer is invalid.

get_max_zoom_level(path_image_pyramid)

Returns the maximum zoom level based on the highest-numbered directory in the specified path.

Parameters:

Name Type Description Default
path_image_pyramid str

Path to the directory containing zoom level directories.

required

Returns:

Name Type Description
int

The maximum zoom level.

make_chromium_from_anndata(adata, path_dega_files)

Generate minimal LandscapeFiles from a Chromium AnnData object.

Parameters

adata : anndata.AnnData AnnData object containing scRNA-seq count data. path_dega_files : str or Path Directory where LandscapeFiles will be written.

Raises

ValueError If the expression matrix contains non-integer values.

make_deepzoom_pyramid(image_path, output_path, pyramid_name, tile_size=512, overlap=0, suffix='.jpeg')

Creates a DeepZoom image pyramid from a JPEG image.

Parameters:

Name Type Description Default
image_path str

Path to the JPEG image file.

required
output_path str

Directory to save the DeepZoom pyramid.

required
pyramid_name str

Name of the pyramid directory.

required
tile_size int

Tile size for the DeepZoom pyramid. Defaults to 512.

512
overlap int

Overlap size for the DeepZoom pyramid. Defaults to 0.

0
suffix str

Suffix for the DeepZoom pyramid tiles. Defaults to ".jpeg".

'.jpeg'

Returns:

Type Description

None

make_meta_cell_image_coord(technology, path_transformation_matrix, path_meta_cell_micron, path_meta_cell_image, image_scale=1, sample=None, paths=None, dataset=None)

Applies an affine transformation to cell coordinates in microns and saves the transformed coordinates in pixels.

Parameters

technology : str The technology used to generate the data, Xenium and MERSCOPE are supported. path_transformation_matrix : str Path to the transformation matrix file path_meta_cell_micron : str Path to the meta cell file with coordinates in microns path_meta_cell_image : str Path to save the meta cell file with coordinates in pixels

Returns

None

Examples

make_meta_cell_image_coord( ... technology='Xenium', ... path_transformation_matrix='data/transformation_matrix.csv', ... path_meta_cell_micron='data/meta_cell_micron.csv', ... path_meta_cell_image='data/meta_cell_image.parquet' ... ) Args: technology (str): The technology used to generate the data (e.g., "Xenium" or "MERSCOPE"). path_transformation_matrix (str): Path to the transformation matrix file. path_meta_cell_micron (str): Path to the meta cell file with coordinates in microns. path_meta_cell_image (str): Path to save the meta cell file with coordinates in pixels. image_scale (float): Scaling factor to convert micron coordinates to pixel coordinates.

Returns:

Type Description

None

make_meta_gene(cbg, path_output)

Creates a DataFrame with genes and their assigned colors.

Parameters:

Name Type Description Default
cbg DataFrame

A sparse DataFrame with genes as columns and barcodes as rows..

required
path_output str

Path to save the meta gene file.

required

Returns:

Type Description

None

make_trx_tiles(technology, path_trx, path_transformation_matrix=None, path_trx_tiles=None, coarse_tile_factor=10, tile_size=250, chunk_size=1000000, verbose=False, image_scale=1, max_workers=1, streaming_tile_assignment=None)

Processes transcript data by dividing it into coarse-grain and fine-grain tiles, applying transformations, and saving the results in a parallelized manner.

Parameters

technology : str The technology used for generating the transcript data (e.g., "MERSCOPE" or "Xenium"). path_trx : str Path to the file containing the transcript data. path_transformation_matrix : str Path to the file containing the transformation matrix (CSV file). path_trx_tiles : str Directory path where the output files (Parquet files) for each tile will be saved. coarse_tile_factor : int, optional Scaling factor of each coarse-grain tile comparing to the fine tile size. tile_size : int, optional Size of each fine-grain tile in microns (default is 250). chunk_size : int, optional Number of rows to process per chunk for memory efficiency (default is 1000000). verbose : bool, optional Flag to enable verbose output (default is False). image_scale : float, optional Scale factor to apply to the transcript coordinates (default is 1.0). max_workers : int, optional Maximum number of parallel workers for processing tiles (default is 1). streaming_tile_assignment : bool or None, optional If True, stream transformed coordinates to Parquet shards and spill per spatial tile (same strategy as row-group mode) instead of concatenating all rows and using partition_by / coarse filters on one huge frame. If None, enable automatically when row count is at least STREAMING_TILE_ASSIGN_ROW_THRESHOLD.

Returns

dict A dictionary containing the bounds of the processed data in both x and y directions.

pack_image_tiles_to_parquet(pyramid_dir, channel_name, output_path, image_format='.webp', delete_source_tiles=True, max_row_groups_per_file=2000)

Pack all image tiles from a DeepZoom pyramid into chunked parquet files with row groups.

Each zoom level's tiles are stored as row groups, allowing efficient range-based access. The formula for row group index is: row_group_index = sum of tiles in previous zoom levels + tile_x * num_tiles_y + tile_y

For large datasets, tiles are split across multiple parquet files, each containing at most max_row_groups_per_file row groups.

Parameters:

Name Type Description Default
pyramid_dir str

Path to the pyramid_images directory.

required
channel_name str

Name of the image channel (e.g., "dapi").

required
output_path str

Path to the output directory (will contain chunk_X.parquet files).

required
image_format str

Image file extension (default ".webp").

'.webp'
delete_source_tiles bool

If True, delete the original tile files after packing.

True
max_row_groups_per_file int

Maximum row groups per file (default 400).

2000

Returns:

Name Type Description
dict

Image tile metadata including grid info per zoom level and image dimensions.

read_cbg_mtx(base_path, barcodes_name='barcodes', features_name='features', technology=None)

Read the cell-by-gene matrix from the mtx files.

Parameters

base_path : str The base path to the directory containing the mtx files.

Returns

cbg : pandas.DataFrame A sparse DataFrame with genes as columns and barcodes as rows.

remove_intermediate_files(path_dega_files)

Remove intermediate image files.

Parameters: - path_dega_files: Path to landscape files directory

resolve_xenium_morphology_ome_path(data_dir)

Locate the morphology OME-TIFF for Xenium-class bundles (including Atera WTA preview).

Standard Xenium output uses morphology_focus/morphology_focus_0000.ome.tif. Some v4-compatible and Atera preview bundles use other names under morphology_focus/ or ship morphology.ome.tif at the bundle root.

Resolution order:

. morphology_focus/morphology_focus_0000.ome.tif (classic Xenium)

. First morphology_focus/morphology_focus_*.ome.tif (lexicographic sort)

. First morphology_focus/*.ome.tif if no morphology_focus_* match

. morphology.ome.tif at bundle root

Parameters

data_dir Path to the outs directory (e.g. containing experiment.xenium).

Returns

Path Path to an existing .ome.tif file.

Raises

FileNotFoundError If no supported morphology TIFF is found.

save_landscape_parameters(technology, path_dega_files, image_name='dapi_files', tile_size=1000, image_info=None, image_format='.webp', use_int_index=True, segmentation_approach='default', use_row_groups=False, tile_grid_info=None, image_tile_info=None, trx_chunk_info=None, cell_chunk_info=None, cbg_chunk_info=None)

Saves the landscape parameters to a JSON file.

Parameters:

Name Type Description Default
technology str

The technology used to generate the data.

required
path_dega_files str

Path to the directory where landscape files are stored.

required
image_name str

Name of the image directory. Defaults to "dapi_files".

'dapi_files'
tile_size int

Tile size for the image pyramid. Defaults to 1000.

1000
image_info dict

Additional image metadata. Defaults to None.

None
image_format str

Format of the image files. Defaults to ".webp".

'.webp'
use_int_index bool

Use integer name for cell_tile and trx_tile.

True
use_row_groups bool

If True, tiles are stored as row groups. Defaults to False.

False
tile_grid_info dict

Tile grid metadata when using row groups.

None
image_tile_info dict

Image tile metadata from pack_image_tiles_to_parquet.

None
trx_chunk_info dict

Chunk info for transcript parquet files.

None
cell_chunk_info dict

Chunk info for cell segmentation parquet files.

None
cbg_chunk_info dict

Chunk info for CBG parquet files.

None

Returns:

Type Description

None

write_identity_transform(path_dega_files)

Write an identity transform matrix for IST data.

write_xenium_transform(data_dir, path_dega_files, transform_fname='micron_to_image_transform.csv')

Extracts the transformation matrix from the Xenium cells.zarr.zip file and saves it as a CSV file.

Parameters:

Name Type Description Default
data_dir str

Path to the directory containing the Xenium data (e.g., cells.zarr.zip).

required
path_dega_files str

Path to the directory where the transformation matrix CSV will be saved.

required
transform_fname str

Name of the output CSV file. Defaults to "micron_to_image_transform.csv".

'micron_to_image_transform.csv'

Returns:

Type Description

numpy.ndarray: The full transformation matrix extracted from the Xenium cells.zarr.zip file.

Raises:

Type Description
FileNotFoundError

If the cells.zarr.zip file does not exist in the specified data_dir.

KeyError

If the transformation matrix is not found in the Zarr file under the expected path.

Exception

If an unexpected error occurs while processing the Zarr file.