Clust Module API Reference
This module provides the main Matrix class for hierarchical data clustering and visualization.
AxisEntity
Bases: TypedDict
Describes what entity a clustergram axis represents.
Attributes:
| Name | Type | Description |
|---|---|---|
entity |
str
|
The type of entity (cell, gene, nbhd, cluster, etc.) |
attr |
str
|
The attribute of that entity (name, leiden, custom_column, etc.) - For cells: 'leiden' means cell clusters, 'name' means individual cells - For nbhd: 'name' means specific neighborhoods - For genes: typically 'name' |
Examples:
Clustergram with cell clusters on rows (cells grouped by leiden)
{"entity": "cell", "attr": "leiden"}
Clustergram with specific cells on columns
{"entity": "cell", "attr": "name"}
Clustergram with neighborhoods on columns
{"entity": "nbhd", "attr": "name"}
Clustergram with genes on rows
{"entity": "gene", "attr": "name"}
Hextile neighborhoods by cell clusters
row: {"entity": "cell", "attr": "leiden"} col: {"entity": "hextile", "attr": "nbhd_cluster"}
EntityType
Bases: Enum
Entity types that can be represented in clustergram rows/columns.
Matrix
High-performance matrix class for single-cell genomics data processing.
Features automatic processing pipeline, hierarchical clustering, and visualization export. Uses intelligent caching for performance with large datasets.
Examples:
Basic usage - applies norm_col='total', norm_row='zscore'
mat = Matrix(adata) viz_data = mat.cluster()
Custom processing with colors
mat = Matrix(adata, filter_genes=5000, norm_row='qn', global_colors={"high": "red", "low": "blue"})
No processing
mat = Matrix(adata, disable_processing=True)
dat
property
Lazy dat structure with intelligent caching.
__init__(data=None, meta_col=None, meta_row=None, col_attr=None, row_attr=None, row_entity='gene', col_entity='cell_cluster', filter_genes=None, norm_col='total', norm_row='zscore', disable_processing=True, global_colors=None, name=None)
Create Matrix with automatic processing unless disabled.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame | AnnData | None
|
DataFrame or AnnData object |
None
|
meta_col
|
DataFrame | None
|
Column metadata (for DataFrame input) |
None
|
meta_row
|
DataFrame | None
|
Row metadata (for DataFrame input) |
None
|
col_attr
|
list[str] | None
|
Column attribute names (categorical or numeric) |
None
|
row_attr
|
list[str] | None
|
Row attribute names (categorical or numeric) |
None
|
row_entity
|
str | dict | AxisEntity | None
|
Entity specification for rows. Accepted formats: - str: Shorthand with implicit attr mapping: - "gene" → {"entity": "gene", "attr": "name"} - "nbhd" → {"entity": "nbhd", "attr": "name"} - "cell" → {"entity": "cell", "attr": "name"} - "hextile" → {"entity": "hextile", "attr": "name"} - "cell_cluster" or "cluster" → {"entity": "cell", "attr": "leiden"} - tuple: Compact format, e.g., ("nbhd", "name") - dict: Full format, e.g., {"entity": "nbhd", "attr": "name"} |
'gene'
|
col_entity
|
str | dict | AxisEntity | None
|
Entity specification for columns (same formats as row_entity) |
'cell_cluster'
|
filter_genes
|
int | None
|
Number of top variable genes to keep (None = no filtering) |
None
|
norm_col
|
str | None
|
Column normalization ('total', 'zscore', 'qn', None) |
'total'
|
norm_row
|
str | None
|
Row normalization ('total', 'zscore', 'qn', None) |
'zscore'
|
disable_processing
|
bool
|
Skip automatic processing (default: False) |
True
|
global_colors
|
dict[str, str] | DataFrame | None
|
Global category color mapping (dict or DataFrame with 'color' column) |
None
|
name
|
str | None
|
Name for the matrix (default: None) |
None
|
Examples:
Automatic processing (recommended)
mat = Matrix(adata) # Applies norm_col='total', norm_row='zscore'
Custom processing with colors
colors = {"Cancer": "#ff0000", "Normal": "#0000ff"} mat = Matrix(adata, filter_genes=5000, norm_row='qn', global_colors=colors)
No processing
mat = Matrix(adata, disable_processing=True)
Raw matrix without data
mat = Matrix() # Empty matrix for manual loading
With entity specifications for widget interaction:
Genes (rows) by cell clusters (columns) - typical gene expression heatmap
mat = Matrix(df, row_entity="gene", col_entity="cell_cluster")
Or equivalently with new format:
mat = Matrix(df, row_entity={"entity": "gene", "attr": "name"}, col_entity={"entity": "cell", "attr": "leiden"})
Neighborhoods by cell types
mat = Matrix(df, row_entity={"entity": "cell", "attr": "leiden"}, col_entity={"entity": "nbhd", "attr": "name"})
add_category(axis, name, data)
Add category to metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
axis
|
AxisInput
|
'row'/'col', 0/1, or Axis enum (0/ROW=rows, 1/COL=columns) |
required |
name
|
str
|
Category name |
required |
data
|
Series
|
Category values (must match axis length) |
required |
add_cats(axis, cat_data)
Add multiple categories to metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
axis
|
AxisInput
|
'row'/'col', 0/1, or Axis enum |
required |
cat_data
|
dict[str, Any]
|
Dict with category name as key, values as list/Series/dict |
required |
Examples:
Add multiple categories at once
mat.add_cats('col', { 'cell_type': ['T-cell', 'B-cell', 'NK-cell'], 'treatment': ['control', 'treated', 'control'] })
From existing metadata
mat.add_cats('col', meta_df.to_dict('series'))
clust(dist_type='cosine', linkage_type='average', force=False)
Perform hierarchical clustering.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dist_type
|
DistanceType
|
Distance metric ('cosine', 'euclidean', 'correlation') |
'cosine'
|
linkage_type
|
LinkageType
|
Linkage method ('average', 'complete', 'ward') |
'average'
|
force
|
bool
|
Override size limits for large matrices |
False
|
cluster(**cluster_kwargs)
Perform clustering and return visualization data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**cluster_kwargs
|
Any
|
Clustering parameters (dist_type, linkage_type, force) |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict[str, Any]
|
Visualization-ready JSON structure |
Examples:
mat = Matrix(adata) viz_data = mat.cluster() # Use defaults viz_data = mat.cluster(dist_type='euclidean', linkage_type='ward')
downsample_to(category='leiden', axis='col', propagate_metadata=False)
Downsample data by aggregating categories using scanpy.get.aggregate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
category
|
str
|
Metadata column to aggregate by |
'leiden'
|
axis
|
AxisInput
|
Which axis to aggregate ('col'/1/COL for cells, 'row'/0/ROW for genes) |
'col'
|
propagate_metadata
|
bool | list[str]
|
Whether to propagate other metadata columns to the aggregated result using the modal (most frequent) value per group. - False: Skip metadata propagation (fast, default) - True: Propagate all metadata columns (slow for large datasets) - list[str]: Propagate only specified columns |
False
|
Requires
scanpy for aggregation functionality
Note
Uses scanpy.get.aggregate under the hood for fast mean aggregation. See: https://scanpy.readthedocs.io/en/stable/generated/scanpy.get.aggregate.html
export_viz_json()
Export visualization as JSON dict.
.. deprecated:: 0.10
Use :meth:export_viz_parquet instead.
export_viz_json_string()
Export visualization as JSON string.
.. deprecated:: 0.10
Use :meth:export_viz_parquet instead.
export_viz_parquet()
Export visualization using Parquet encoded tables.
export_viz_to_widget(which_viz='viz')
Export visualization for widget.
.. deprecated:: 0.10
Use :class:celldega.viz.Clustergram with matrix instead.
filter(axis, by, num)
Filter features by specified metric.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
axis
|
AxisInput
|
'row'/'col', 0/1, or Axis enum (0/ROW=rows, 1/COL=columns) |
required |
by
|
FilterType
|
Metric ('var' for variance, 'mean' for mean) |
required |
num
|
int
|
Number of top features to keep |
required |
load_adata(adata, col_attr=None, row_attr=None)
Load AnnData object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
AnnData object (will be transposed to genes x cells) |
required |
load_df(df, meta_col=None, meta_row=None, col_attr=None, row_attr=None)
Load DataFrame with metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Data matrix |
required |
meta_col
|
DataFrame | None
|
Column metadata (must match df.columns) |
None
|
meta_row
|
DataFrame | None
|
Row metadata (must match df.index) |
None
|
col_attr
|
list[str] | None
|
Column attribute names for viz (categorical or numeric) |
None
|
row_attr
|
list[str] | None
|
Row attribute names for viz (categorical or numeric) |
None
|
make_viz()
Generate visualization data structure.
norm(axis, by)
Normalize data along specified axis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
axis
|
AxisInput
|
'row'/'col', 0/1, or Axis enum (0/ROW=rows, 1/COL=columns) |
required |
by
|
NormType
|
Normalization method ('total', 'zscore', 'qn') |
required |
process(filter_genes=None, norm_col='total', norm_row='zscore')
Apply processing pipeline to the matrix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filter_genes
|
int | None
|
Number of top variable genes to keep |
None
|
norm_col
|
str | None
|
Column normalization method ('total', 'zscore', 'qn', None) |
'total'
|
norm_row
|
str | None
|
Row normalization method ('total', 'zscore', 'qn', None) |
'zscore'
|
Examples:
mat = Matrix(adata, disable_processing=True) # Raw data mat.process(filter_genes=5000, norm_row='qn') # Custom processing
random_subsample(axis, num, seed=42)
Randomly subsample features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
axis
|
AxisInput
|
'row'/'col', 0/1, or Axis enum (0/ROW=rows, 1/COL=columns) |
required |
num
|
int
|
Number of features to sample |
required |
seed
|
int
|
Random seed for reproducibility |
42
|
set_cat_color(axis, cat_index, cat_name, color)
Set color for specific category value in a specific category column.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
axis
|
AxisInput
|
'row'/'col', 0/1, or Axis enum |
required |
cat_index
|
int
|
Category column index (1-based, like original Network) |
required |
cat_name
|
str
|
Category value name to color |
required |
color
|
str
|
Hex color string or named color |
required |
Example
Set color for 'Cancer' in the first column category
mat.set_cat_color('col', 1, 'Cancer', '#ff0000')
set_cat_colors(axis, cat_index, color_mapping)
Set colors for multiple category values in a specific category column.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
axis
|
AxisInput
|
'row'/'col', 0/1, or Axis enum |
required |
cat_index
|
int
|
Category column index (1-based) |
required |
color_mapping
|
dict[str, str]
|
Dict mapping category values to colors |
required |
Example
Set colors for multiple values in tissue type category
mat.set_cat_colors('col', 1, { 'Liver': '#00ff00', 'Brain': '#ffff00', 'Heart': '#ff00ff' })
set_global_cat_colors(color_mapping=None)
Set global category color mapping that applies across all categories.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
color_mapping
|
dict[str, str] | DataFrame | None
|
Dict mapping category values to colors, DataFrame with 'color' column, or None to auto-generate |
None
|
Note
If metadata has a 'color' column, those colors will be used automatically.
set_matrix_colors(pos='red', neg='blue')
Set matrix color scheme for positive and negative values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pos
|
str
|
Color for positive values (hex or named color) |
'red'
|
neg
|
str
|
Color for negative values (hex or named color) |
'blue'
|
Example
mat.set_matrix_colors(pos="#ff0000", neg="#0000ff")
subset(axis, by)
Subset data by feature list.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
axis
|
AxisInput
|
'row'/'col', 0/1, or Axis enum (0/ROW=rows, 1/COL=columns) |
required |
by
|
list[str]
|
List of feature names to keep |
required |
to_adata()
Convert to AnnData object.
to_df()
Return DataFrame copy of data.
write_dega_files(path, name=None)
Write Clustergram visualization data to a DegaFiles directory.
This creates a cgm/ subdirectory containing the parquet files needed
to load the Clustergram in JavaScript without a Python backend.
Parameters
path : str or Path
Path to the DegaFiles directory (the same directory used for
Landscape and Yearbook data).
name : str, optional
Name for this Clustergram. If provided, files are saved to
cgm/{name}/. If None, uses the matrix's name attribute,
or "default" if no name is set.
Examples
mat = Matrix(adata) mat.clust() mat.write_dega_files("./my_dega_files", name="skin_cancer_clusters")
JavaScript can then load from:
base_url + '/cgm/skin_cancer_clusters/'
Notes
The following files are created:
- mat.parquet: The matrix data
- row_nodes.parquet: Row node information
- col_nodes.parquet: Column node information
- row_linkage.parquet: Row dendrogram linkage
- col_linkage.parquet: Column dendrogram linkage
- meta.json: Metadata including colors and config
normalize_axis_entity(value)
Normalize an axis entity specification to the AxisEntity format.
Handles backwards compatibility with string-only entity values and supports compact tuple format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
str | tuple | dict | AxisEntity | None
|
Entity specification - can be: - str: Shorthand format with implicit attr (see mapping below) - tuple: Compact format (entity, attr) e.g., ("nbhd", "name") - dict/AxisEntity: Full format with entity and attr keys - None: Returns default {"entity": "gene", "attr": "name"} |
required |
String Shorthand Mapping
When a string is provided, the following implicit attr values are used:
- "gene" → {"entity": "gene", "attr": "name"}
- "nbhd" → {"entity": "nbhd", "attr": "name"}
- "cell" → {"entity": "cell", "attr": "name"}
- "hextile" → {"entity": "hextile", "attr": "name"}
- "cell_cluster" or "cluster" → {"entity": "cell", "attr": "leiden"}
- any other string → {"entity":
Returns:
| Type | Description |
|---|---|
AxisEntity
|
AxisEntity with entity and attr keys |
Examples:
String shorthand (attr is implicit)
>>> normalize_axis_entity("gene")
{"entity": "gene", "attr": "name"}
>>> normalize_axis_entity("nbhd")
{"entity": "nbhd", "attr": "name"}
>>> normalize_axis_entity("cell_cluster")
{"entity": "cell", "attr": "leiden"}
Tuple format (explicit entity and attr)
>>> normalize_axis_entity(("nbhd", "name"))
{"entity": "nbhd", "attr": "name"}
>>> normalize_axis_entity(("cell", "leiden"))
{"entity": "cell", "attr": "leiden"}
Dict format (most explicit)
>>> normalize_axis_entity({"entity": "cell", "attr": "leiden"})
{"entity": "cell", "attr": "leiden"}