Skip to content

Clust Module API Reference

This module provides the main Matrix class for hierarchical data clustering and visualization.

AxisEntity

Bases: TypedDict

Describes what entity a clustergram axis represents.

Attributes:

Name Type Description
entity str

The type of entity (cell, gene, nbhd, cluster, etc.)

attr str

The attribute of that entity (name, leiden, custom_column, etc.) - For cells: 'leiden' means cell clusters, 'name' means individual cells - For nbhd: 'name' means specific neighborhoods - For genes: typically 'name'

Examples:

Clustergram with cell clusters on rows (cells grouped by leiden)

{"entity": "cell", "attr": "leiden"}

Clustergram with specific cells on columns

{"entity": "cell", "attr": "name"}

Clustergram with neighborhoods on columns

{"entity": "nbhd", "attr": "name"}

Clustergram with genes on rows

{"entity": "gene", "attr": "name"}

Hextile neighborhoods by cell clusters

row: {"entity": "cell", "attr": "leiden"} col: {"entity": "hextile", "attr": "nbhd_cluster"}

EntityType

Bases: Enum

Entity types that can be represented in clustergram rows/columns.

Matrix

High-performance matrix class for single-cell genomics data processing.

Features automatic processing pipeline, hierarchical clustering, and visualization export. Uses intelligent caching for performance with large datasets.

Examples:

Basic usage - applies norm_col='total', norm_row='zscore'

mat = Matrix(adata) viz_data = mat.cluster()

Custom processing with colors

mat = Matrix(adata, filter_genes=5000, norm_row='qn', global_colors={"high": "red", "low": "blue"})

No processing

mat = Matrix(adata, disable_processing=True)

dat property

Lazy dat structure with intelligent caching.

__init__(data=None, meta_col=None, meta_row=None, col_attr=None, row_attr=None, row_entity='gene', col_entity='cell_cluster', filter_genes=None, norm_col='total', norm_row='zscore', disable_processing=True, global_colors=None, name=None)

Create Matrix with automatic processing unless disabled.

Parameters:

Name Type Description Default
data DataFrame | AnnData | None

DataFrame or AnnData object

None
meta_col DataFrame | None

Column metadata (for DataFrame input)

None
meta_row DataFrame | None

Row metadata (for DataFrame input)

None
col_attr list[str] | None

Column attribute names (categorical or numeric)

None
row_attr list[str] | None

Row attribute names (categorical or numeric)

None
row_entity str | dict | AxisEntity | None

Entity specification for rows. Accepted formats: - str: Shorthand with implicit attr mapping: - "gene" → {"entity": "gene", "attr": "name"} - "nbhd" → {"entity": "nbhd", "attr": "name"} - "cell" → {"entity": "cell", "attr": "name"} - "hextile" → {"entity": "hextile", "attr": "name"} - "cell_cluster" or "cluster" → {"entity": "cell", "attr": "leiden"} - tuple: Compact format, e.g., ("nbhd", "name") - dict: Full format, e.g., {"entity": "nbhd", "attr": "name"}

'gene'
col_entity str | dict | AxisEntity | None

Entity specification for columns (same formats as row_entity)

'cell_cluster'
filter_genes int | None

Number of top variable genes to keep (None = no filtering)

None
norm_col str | None

Column normalization ('total', 'zscore', 'qn', None)

'total'
norm_row str | None

Row normalization ('total', 'zscore', 'qn', None)

'zscore'
disable_processing bool

Skip automatic processing (default: False)

True
global_colors dict[str, str] | DataFrame | None

Global category color mapping (dict or DataFrame with 'color' column)

None
name str | None

Name for the matrix (default: None)

None

Examples:

mat = Matrix(adata) # Applies norm_col='total', norm_row='zscore'

Custom processing with colors

colors = {"Cancer": "#ff0000", "Normal": "#0000ff"} mat = Matrix(adata, filter_genes=5000, norm_row='qn', global_colors=colors)

No processing

mat = Matrix(adata, disable_processing=True)

Raw matrix without data

mat = Matrix() # Empty matrix for manual loading

With entity specifications for widget interaction:

Genes (rows) by cell clusters (columns) - typical gene expression heatmap

mat = Matrix(df, row_entity="gene", col_entity="cell_cluster")

Or equivalently with new format:

mat = Matrix(df, row_entity={"entity": "gene", "attr": "name"}, col_entity={"entity": "cell", "attr": "leiden"})

Neighborhoods by cell types

mat = Matrix(df, row_entity={"entity": "cell", "attr": "leiden"}, col_entity={"entity": "nbhd", "attr": "name"})

add_category(axis, name, data)

Add category to metadata.

Parameters:

Name Type Description Default
axis AxisInput

'row'/'col', 0/1, or Axis enum (0/ROW=rows, 1/COL=columns)

required
name str

Category name

required
data Series

Category values (must match axis length)

required

add_cats(axis, cat_data)

Add multiple categories to metadata.

Parameters:

Name Type Description Default
axis AxisInput

'row'/'col', 0/1, or Axis enum

required
cat_data dict[str, Any]

Dict with category name as key, values as list/Series/dict

required

Examples:

Add multiple categories at once

mat.add_cats('col', { 'cell_type': ['T-cell', 'B-cell', 'NK-cell'], 'treatment': ['control', 'treated', 'control'] })

From existing metadata

mat.add_cats('col', meta_df.to_dict('series'))

clust(dist_type='cosine', linkage_type='average', force=False)

Perform hierarchical clustering.

Parameters:

Name Type Description Default
dist_type DistanceType

Distance metric ('cosine', 'euclidean', 'correlation')

'cosine'
linkage_type LinkageType

Linkage method ('average', 'complete', 'ward')

'average'
force bool

Override size limits for large matrices

False

cluster(**cluster_kwargs)

Perform clustering and return visualization data.

Parameters:

Name Type Description Default
**cluster_kwargs Any

Clustering parameters (dist_type, linkage_type, force)

{}

Returns:

Name Type Description
dict dict[str, Any]

Visualization-ready JSON structure

Examples:

mat = Matrix(adata) viz_data = mat.cluster() # Use defaults viz_data = mat.cluster(dist_type='euclidean', linkage_type='ward')

downsample_to(category='leiden', axis='col', propagate_metadata=False)

Downsample data by aggregating categories using scanpy.get.aggregate.

Parameters:

Name Type Description Default
category str

Metadata column to aggregate by

'leiden'
axis AxisInput

Which axis to aggregate ('col'/1/COL for cells, 'row'/0/ROW for genes)

'col'
propagate_metadata bool | list[str]

Whether to propagate other metadata columns to the aggregated result using the modal (most frequent) value per group. - False: Skip metadata propagation (fast, default) - True: Propagate all metadata columns (slow for large datasets) - list[str]: Propagate only specified columns

False
Requires

scanpy for aggregation functionality

Note

Uses scanpy.get.aggregate under the hood for fast mean aggregation. See: https://scanpy.readthedocs.io/en/stable/generated/scanpy.get.aggregate.html

export_viz_json()

Export visualization as JSON dict.

.. deprecated:: 0.10 Use :meth:export_viz_parquet instead.

export_viz_json_string()

Export visualization as JSON string.

.. deprecated:: 0.10 Use :meth:export_viz_parquet instead.

export_viz_parquet()

Export visualization using Parquet encoded tables.

export_viz_to_widget(which_viz='viz')

Export visualization for widget.

.. deprecated:: 0.10 Use :class:celldega.viz.Clustergram with matrix instead.

filter(axis, by, num)

Filter features by specified metric.

Parameters:

Name Type Description Default
axis AxisInput

'row'/'col', 0/1, or Axis enum (0/ROW=rows, 1/COL=columns)

required
by FilterType

Metric ('var' for variance, 'mean' for mean)

required
num int

Number of top features to keep

required

load_adata(adata, col_attr=None, row_attr=None)

Load AnnData object.

Parameters:

Name Type Description Default
adata AnnData

AnnData object (will be transposed to genes x cells)

required

load_df(df, meta_col=None, meta_row=None, col_attr=None, row_attr=None)

Load DataFrame with metadata.

Parameters:

Name Type Description Default
df DataFrame

Data matrix

required
meta_col DataFrame | None

Column metadata (must match df.columns)

None
meta_row DataFrame | None

Row metadata (must match df.index)

None
col_attr list[str] | None

Column attribute names for viz (categorical or numeric)

None
row_attr list[str] | None

Row attribute names for viz (categorical or numeric)

None

make_viz()

Generate visualization data structure.

norm(axis, by)

Normalize data along specified axis.

Parameters:

Name Type Description Default
axis AxisInput

'row'/'col', 0/1, or Axis enum (0/ROW=rows, 1/COL=columns)

required
by NormType

Normalization method ('total', 'zscore', 'qn')

required

process(filter_genes=None, norm_col='total', norm_row='zscore')

Apply processing pipeline to the matrix.

Parameters:

Name Type Description Default
filter_genes int | None

Number of top variable genes to keep

None
norm_col str | None

Column normalization method ('total', 'zscore', 'qn', None)

'total'
norm_row str | None

Row normalization method ('total', 'zscore', 'qn', None)

'zscore'

Examples:

mat = Matrix(adata, disable_processing=True) # Raw data mat.process(filter_genes=5000, norm_row='qn') # Custom processing

random_subsample(axis, num, seed=42)

Randomly subsample features.

Parameters:

Name Type Description Default
axis AxisInput

'row'/'col', 0/1, or Axis enum (0/ROW=rows, 1/COL=columns)

required
num int

Number of features to sample

required
seed int

Random seed for reproducibility

42

set_cat_color(axis, cat_index, cat_name, color)

Set color for specific category value in a specific category column.

Parameters:

Name Type Description Default
axis AxisInput

'row'/'col', 0/1, or Axis enum

required
cat_index int

Category column index (1-based, like original Network)

required
cat_name str

Category value name to color

required
color str

Hex color string or named color

required
Example

Set color for 'Cancer' in the first column category

mat.set_cat_color('col', 1, 'Cancer', '#ff0000')

set_cat_colors(axis, cat_index, color_mapping)

Set colors for multiple category values in a specific category column.

Parameters:

Name Type Description Default
axis AxisInput

'row'/'col', 0/1, or Axis enum

required
cat_index int

Category column index (1-based)

required
color_mapping dict[str, str]

Dict mapping category values to colors

required
Example

Set colors for multiple values in tissue type category

mat.set_cat_colors('col', 1, { 'Liver': '#00ff00', 'Brain': '#ffff00', 'Heart': '#ff00ff' })

set_global_cat_colors(color_mapping=None)

Set global category color mapping that applies across all categories.

Parameters:

Name Type Description Default
color_mapping dict[str, str] | DataFrame | None

Dict mapping category values to colors, DataFrame with 'color' column, or None to auto-generate

None
Note

If metadata has a 'color' column, those colors will be used automatically.

set_matrix_colors(pos='red', neg='blue')

Set matrix color scheme for positive and negative values.

Parameters:

Name Type Description Default
pos str

Color for positive values (hex or named color)

'red'
neg str

Color for negative values (hex or named color)

'blue'
Example

mat.set_matrix_colors(pos="#ff0000", neg="#0000ff")

subset(axis, by)

Subset data by feature list.

Parameters:

Name Type Description Default
axis AxisInput

'row'/'col', 0/1, or Axis enum (0/ROW=rows, 1/COL=columns)

required
by list[str]

List of feature names to keep

required

to_adata()

Convert to AnnData object.

to_df()

Return DataFrame copy of data.

write_dega_files(path, name=None)

Write Clustergram visualization data to a DegaFiles directory.

This creates a cgm/ subdirectory containing the parquet files needed to load the Clustergram in JavaScript without a Python backend.

Parameters

path : str or Path Path to the DegaFiles directory (the same directory used for Landscape and Yearbook data). name : str, optional Name for this Clustergram. If provided, files are saved to cgm/{name}/. If None, uses the matrix's name attribute, or "default" if no name is set.

Examples

mat = Matrix(adata) mat.clust() mat.write_dega_files("./my_dega_files", name="skin_cancer_clusters")

JavaScript can then load from:

base_url + '/cgm/skin_cancer_clusters/'

Notes

The following files are created: - mat.parquet: The matrix data - row_nodes.parquet: Row node information - col_nodes.parquet: Column node information - row_linkage.parquet: Row dendrogram linkage - col_linkage.parquet: Column dendrogram linkage - meta.json: Metadata including colors and config

normalize_axis_entity(value)

Normalize an axis entity specification to the AxisEntity format.

Handles backwards compatibility with string-only entity values and supports compact tuple format.

Parameters:

Name Type Description Default
value str | tuple | dict | AxisEntity | None

Entity specification - can be: - str: Shorthand format with implicit attr (see mapping below) - tuple: Compact format (entity, attr) e.g., ("nbhd", "name") - dict/AxisEntity: Full format with entity and attr keys - None: Returns default {"entity": "gene", "attr": "name"}

required
String Shorthand Mapping

When a string is provided, the following implicit attr values are used: - "gene" → {"entity": "gene", "attr": "name"} - "nbhd" → {"entity": "nbhd", "attr": "name"} - "cell" → {"entity": "cell", "attr": "name"} - "hextile" → {"entity": "hextile", "attr": "name"} - "cell_cluster" or "cluster" → {"entity": "cell", "attr": "leiden"} - any other string → {"entity": , "attr": "name"}

Returns:

Type Description
AxisEntity

AxisEntity with entity and attr keys

Examples:

String shorthand (attr is implicit)

>>> normalize_axis_entity("gene")
{"entity": "gene", "attr": "name"}
>>> normalize_axis_entity("nbhd")
{"entity": "nbhd", "attr": "name"}
>>> normalize_axis_entity("cell_cluster")
{"entity": "cell", "attr": "leiden"}

Tuple format (explicit entity and attr)

>>> normalize_axis_entity(("nbhd", "name"))
{"entity": "nbhd", "attr": "name"}
>>> normalize_axis_entity(("cell", "leiden"))
{"entity": "cell", "attr": "leiden"}

Dict format (most explicit)

>>> normalize_axis_entity({"entity": "cell", "attr": "leiden"})
{"entity": "cell", "attr": "leiden"}