Skip to content

Set Module API Reference

The set module provides SetCollection, the set-level Celldega collection. A SetCollection represents collections that are literally defined as sets of some base element (most commonly cells) with no intrinsic geometry of their own — clustering results, spatial-domain algorithm outputs (SpaGCN, GraphST, GASTON, Points2Regions), or manual annotations projected back to cells.

Each observation is one set. The defining modality membership is a sparse sets × cells incidence matrix, so a set never loses track of exactly which cells belong to it. Where DatasetCollection and NeighborhoodCollection make a derived feature (an expression signature, a geometry) first-class, SetCollection makes membership itself first-class, and signatures, population composition, and set-to-set overlap are all derived from it.

import celldega as dega

# Build one SetCollection per clustering "opinion" (the cells define the sets)
clust = dega.set.SetCollection(adata, set_col="leiden", name="leiden")

# Per-set expression signature (pseudobulk). feature_type is only required when
# passing a MuData; for an AnnData it defaults to "gene" -> modality "expression".
clust.calc_signature(adata)
clust.calc_signature(mdata, feature_type="protein")   # protein modality of a MuData

# Per-set cell-type composition (sets x populations)
clust.calc_population(adata, category="cell_type")

# Cross-algorithm comparison: membership IoU between two SetCollections that
# share the same cells (different obs). Rectangular modality on `clust`.
clust_b = dega.set.SetCollection(adata, set_col="spagcn", name="spagcn")
clust.calc_overlap(clust_b)

# Consensus across algorithms: concatenate, self-overlap (square relation),
# make it a clusterable modality, then cut the dendrogram via the Matrix.
combined = dega.set.concat_sets([clust, clust_b])
combined.calc_overlap()                        # -> combined.relations["overlap"]
combined.add_relation_modality("overlap")      # -> combined.mod["overlap_relation"]

clust.write("clusters.h5mu")
loaded = dega.set.SetCollection.read("clusters.h5mu")

Hierarchical clustering of any modality is done with the Matrix / Clustergram classes, and the resulting dendrogram can be cut into flat labels with Matrix.to_cluster / Clustergram.to_cluster (e.g. to define consensus domains or meta-clusters), which you then attach back to the collection's obs.

Set-level Celldega collection objects.

SetCollection

Bases: CelldegaCollection

Set-level Celldega collection backed by a sets x elements membership matrix.

The canonical observation axis is one row per set; the defining modality membership is a sparse AnnData with sets as observations and elements (cells) as variables, carrying per-cell spatial coordinates in var when available. Feature spaces (expression signatures) and relations (set-to-set overlap) are derived from this membership.

__init__(adata=None, set_col=None, obs=None, mdata=None, membership=None, name=None, source=None, element_type='cell', meta=None, mod=None, relations=None, provenance=None, uns=None)

Build a set-level collection.

The set observation axis is established one of three ways: from a pre-built mdata (e.g. via :meth:read), from a ready-made membership modality, or — most commonly — by binning cell-level adata over the categorical set_col (one row per unique label), which also constructs the sparse membership modality and tags cell coordinates onto its var.

Parameters:

Name Type Description Default
adata AnnData | None

Cell-level AnnData whose set_col labels define the sets (required when neither obs/membership nor mdata given).

None
set_col str | None

adata.obs column whose categories become the sets (e.g. "leiden", "spagcn"); recorded as the source algorithm.

None
obs DataFrame | None

Pre-built set observation table (alternative to adata).

None
mdata Any | None

Pre-built MuData to wrap (e.g. from read).

None
membership AnnData | None

Pre-built sets x elements membership modality.

None
name str | None

Optional collection / algorithm name (e.g. "spagcn").

None
source str | dict[str, Any] | None

Source descriptor recorded in provenance.

None
element_type str

Entity type of the membership var axis; "cell" today, "gene" for a future gene-set library.

'cell'
meta dict[str, Any] | None

Extra metadata merged into uns["celldega"].

None
mod dict[str, AnnData] | None

Feature-space modalities to attach up front.

None
relations dict[str, Any] | None

Square set-by-set matrices for mdata.obsp.

None
provenance dict[str, Any] | None

Free-form provenance metadata.

None
uns dict[str, Any] | None

Extra Celldega metadata.

None

Raises:

Type Description
ValueError

If no construction input (adata + set_col, obs/membership, or mdata) is provided.

calc_overlap(other=None, weights='membership', metric='iou', key='overlap', modality_name=None, var_entity_type='set')

Calculate set-to-set membership overlap (the cross-algorithm comparison engine).

Computes overlap between this collection's sets and other's sets over their shared element (cell) axis as A.X @ B.X.T. One engine, two outputs:

  • other is None (self-overlap, e.g. on a concatenated collection) → a square relation stored in self.relations[key]; convert it to a clusterable modality with :meth:add_relation_modality and hierarchically cluster to find consensus sets (Fig 4C-i).
  • other given → a rectangular modality self_sets x other_sets attached to self.mod (e.g. domains vs. manual annotation, Fig 4C-ii).

Parameters:

Name Type Description Default
other SetCollection | None

Another SetCollection sharing the element axis; defaults to self.

None
weights str

Membership modality to compare on.

'membership'
metric str

"iou" (Jaccard) or "intersection" (raw shared count).

'iou'
key str

Relation key (self-overlap) or default modality stem.

'overlap'
modality_name str | None

Modality key for the cross-collection case.

None
var_entity_type str

Entity type for the rectangular modality's var.

'set'

Returns:

Type Description
ndarray

The dense overlap matrix (also stored as a relation or modality).

calc_population(data, category='leiden', output='proportion', weights='membership', modality_name='population')

Calculate a set-by-population composition modality.

For each set, counts its member cells per category value (cell type / cluster) into a sets x populations modality — e.g. the cell-type composition of each spatial domain. Computed as membership @ one_hot(category). Mirrors NeighborhoodCollection.calc_population / DatasetCollection.calc_population.

Parameters:

Name Type Description Default
data AnnData | MuData

Cell-level AnnData (or MuData) carrying category in obs; cells are aligned to the membership var axis.

required
category str

obs column naming the population/cell-type/cluster.

'leiden'
output str

"proportion" (within-set fractions) or "counts".

'proportion'
weights str

Membership modality to aggregate.

'membership'
modality_name str

Key for the modality in self.mod.

'population'

Returns:

Type Description
None

None — the modality is attached to self.mod[modality_name].

calc_signature(data, feature_type=None, layer=None, weights='membership', aggregate='mean', normalization='log1p_cpm', modality_name=None)

Calculate and attach a set-by-feature signature (pseudobulk).

Aggregates the per-cell feature matrix of each set's member cells into a sets x features modality, using the stored membership matrix as the aggregation operator. Consistent with DatasetCollection.calc_signature and NeighborhoodCollection.calc_signature — the entity is implied by the instance, so it is not repeated in the name.

feature_type is only needed when data is a MuData (it names the modality to aggregate and labels the output). For a plain AnnData the matrix is unambiguous and feature_type defaults to "gene"; pass a protein AnnData (with feature_type="protein" to label it) for a protein signature, or use layer for an alternative matrix over the same features (raw vs. normalized).

Parameters:

Name Type Description Default
data AnnData | MuData

Cell-level AnnData, or a MuData paired with feature_type. Cells are aligned to the membership var axis.

required
feature_type str | None

Output feature label / MuData modality selector. Required for MuData; optional for AnnData (default "gene").

None
layer str | None

adata layer to aggregate; None uses adata.X.

None
weights str

Membership modality driving aggregation — "membership" (binary, hard assignment) or "weight" (soft/probabilistic).

'membership'
aggregate str

"mean" or "sum" across each set's member cells.

'mean'
normalization str | None

None, "cpm", or "log1p_cpm" per set row.

'log1p_cpm'
modality_name str | None

Key for the modality; defaults to "expression" for genes and to feature_type otherwise.

None

Returns:

Type Description
None

None — the modality is attached to self.mod.

to_nbhd(method='points', **kwargs)

Graduate set membership to geometry, returning a NeighborhoodCollection.

For each set, gather its member cells, read their coordinates from the membership.var axis, and materialize geometry: "points" stores the raw MultiPoint (unopinionated); "alpha_shape" / "convex_hull" build a polygon (opinionated). The inverse operation, NeighborhoodCollection.to_set, projects geometry back to cell sets — round-tripping alpha_shape quantifies how faithfully a polygon recovers its defining cells (precision/recall).

TODO(DEGA-487): implement by reusing nbhd.alpha_shape_cell_clusters and constructing a NeighborhoodCollection (lazy import to avoid a cycle).

concat_sets(collections, names=None, weights='membership')

Stack per-algorithm SetCollection objects into one comparison collection.

Unions the element (cell) axis across all inputs, prefixes each set id with its collection name (so spagcn::3 and gaston::5 stay distinct), and vstacks the membership matrices. The result is the input to a self :meth:SetCollection.calc_overlapadd_relation_modality → hierarchical-clustering consensus workflow.

Parameters:

Name Type Description Default
collections list[SetCollection]

Per-algorithm set collections sharing an element namespace.

required
names list[str] | None

Optional prefixes; defaults to each collection's name or index.

None
weights str

Membership modality to stack.

'membership'

Returns:

Type Description
SetCollection

A combined SetCollection whose obs carries a set_source column.