Skip to content

CelldegaCollection API Reference

A CelldegaCollection is the base Class that is used to build Celldega's dataset-level (DatasetCollection) and neighborhood-level (NeighborhoodCollection) data structures. Celldega collections are typed MuData profiles. AnnData is the unit of a feature space; MuData is the unit of a multimodal Celldega collection.

Motivation

Celldega defines new biological entities — datasets, neighborhoods, and more in the future — which requires both constructing the entity and calculating its feature spaces, neither of which is free for entities above the single-cell level. For single-cell gene expression and spatial data, both come straight off the instrument; for higher-order entities, DatasetCollection and NeighborhoodCollection construct the observation axis and attach the feature modalities themselves.

AnnData is an excellent representation for one observation-by-feature matrix plus aligned annotations, graphs, and metadata. Celldega collections need several independently clusterable feature spaces over the same biological observation axis: genes, populations, image features, morphology features, clinical variables, and derived joint spaces.

MuData provides that collection layer by storing each feature space as its own AnnData modality while preserving shared observation metadata. Celldega adds a thin schema convention on top for biological entity typing, provenance, geometry, and view-linking metadata.

Core Model

A CelldegaCollection is a thin wrapper over a MuData (collection.mdata). Its core accessors are convenience aliases that proxy directly to attributes of the underlying MuData — they are not separate storage:

Concept Accessor Is exactly
Canonical observations collection.obs collection.mdata.obs
Feature spaces collection.mod[name] collection.mdata.mod[name]
Observation relations collection.relations[name] collection.mdata.obsp[name]
Celldega metadata collection.uns collection.mdata.uns["celldega"]

In particular, collection.relations is collection.mdata.obsp (the same object): collection.relations["x"] is collection.mdata.obsp["x"]. The relations name is just Celldega vocabulary for MuData's obsp ("observation pairwise") store — use whichever you prefer. Relations live in obsp (the shared, collection-level observation axis) rather than inside a single modality's obsp because they are modality-independent properties of the observations themselves; feature-by-feature relations belong in a modality's varp.

Each modality is a normal AnnData object. Its X is the clusterable matrix and its var table describes the local feature/entity axis. Celldega stores the global row entity type in mdata.uns["celldega"]["obs_entity_type"] and stores modality-local entity types in mdata.mod[name].var["entity_type"]. Higher-order collections do not embed lower-level source objects such as single-cell AnnData. Source data can be linked through lightweight metadata in collection.uns["sources"] and recorded in modality provenance.

obsp is the right native location for graph-like or distance-like observation pairs. When a workflow needs to treat a square relation matrix as AnnData.X for heatmap or Matrix-style clustering, materialize it as a modality:

# make a new modality from a pre-existing relationship
collection.add_relation_modality("similarity")

# view new modality
collection.mod["similarity_relation"]

API

MuData-backed Celldega collection schema objects.

CelldegaCollection

Base Celldega collection profile backed by MuData.

Celldega defines new biological entities (datasets, neighborhoods, and more in the future), which requires both constructing the entity and calculating its feature spaces — neither of which is free for entities above the single-cell level. For single-cell data both steps come straight off the instrument; for higher-order entities DatasetCollection and NeighborhoodCollection build the observation axis and attach the feature modalities themselves.

Attributes:

Name Type Description
mdata

The underlying multimodal object.

mod dict[str, AnnData]

MuData modalities. Each modality is a clusterable AnnData feature matrix.

obs DataFrame

Canonical biological observation axis shared by modalities.

relations Any

Global observation-by-observation relations stored in mdata.obsp.

uns dict[str, Any]

Celldega schema metadata stored in mdata.uns["celldega"].

collection_type property

Celldega collection type, such as "dataset" or "neighborhood".

mod property

Named feature modalities.

obs property writable

Canonical collection observation table.

provenance property

Collection-level provenance metadata.

relations property

Global observation-by-observation relations.

This is a named accessor for mdata.obsp (not a separate store): relations are square matrices over the collection's observation axis, shared across all modalities. They live here rather than inside a single modality's obsp because they are properties of the observations themselves (e.g. neighborhood overlap or bordering, derived from geometry) and are modality-independent. Feature-by-feature relations belong in a modality's varp instead.

uns property

Celldega schema metadata namespace.

__init__(obs=None, mod=None, mdata=None, relations=None, provenance=None, uns=None, collection_type=None, obs_entity_type=None)

Build a collection from an observation table, modalities, or a MuData.

Exactly one of three construction paths is taken, in priority order:

  1. mdata given — wrap an existing (e.g. freshly read) MuData; if obs is also given it replaces the top-level observation table.
  2. mod given (no mdata) — each modality is aligned to obs (or, when obs is omitted, to the first modality's own obs) and the modalities become the collection's feature spaces.
  3. neither — an empty collection carrying only the obs axis is built; modalities are attached later via add_mod / calc_* methods.

Parameters:

Name Type Description Default
obs DataFrame | None

Canonical observation table; its index becomes the collection's observation axis.

None
mod dict[str, AnnData] | None

Named feature-space modalities to attach, each aligned to obs.

None
mdata MuData | None

A pre-built MuData to wrap directly.

None
relations dict[str, spmatrix] | None

Square observation-by-observation matrices stored in mdata.obsp (see :attr:relations).

None
provenance dict[str, Any] | None

Free-form provenance merged into uns["celldega"]["provenance"].

None
uns dict[str, Any] | None

Extra Celldega metadata merged into uns["celldega"].

None
collection_type str | None

Schema tag such as "dataset" or "neighborhood"; defaults to "collection".

None
obs_entity_type str | None

Biological type of each observation (e.g. "dataset", "neighborhood"), recorded in the metadata.

None

Raises:

Type Description
ValueError

If none of obs, mod, or mdata is provided.

add_mod(key, adata, var_entity_type=None)

Attach a feature-space modality, aligned to the collection axis.

adata is aligned to the collection's obs index via :func:_align_mod_to_obs (observations missing from adata become zero-filled rows, extras are dropped, sparse X stays sparse), stored under key in mod, and the underlying MuData is refreshed.

Parameters:

Name Type Description Default
key str

Modality name (key in collection.mod).

required
adata AnnData

Feature matrix to attach; it need not already match the axis.

required
var_entity_type str | None

If given, written to the stored modality's var["entity_type"]. Used downstream by Matrix to infer axis entities (e.g. "gene" or "cell_population").

None

Returns:

Type Description
AnnData

The aligned AnnData exactly as stored in collection.mod[key].

add_relation_modality(relation_key, key=None, var_entity_type=None)

Materialize a square observation relation as a clusterable modality.

Relations live canonically in mdata.obsp. Use this when a workflow needs the relation as an AnnData.X matrix — e.g. Matrix-style heatmap clustering of an observation-by-observation similarity/distance matrix. The resulting modality is labelled by the observation index on both axes (var carries a related_obs_id column).

Parameters:

Name Type Description Default
relation_key str

Key of the relation in relations (mdata.obsp).

required
key str | None

Name for the new modality; defaults to f"{relation_key}_relation".

None
var_entity_type str | None

Entity type for the modality's var axis; defaults to the collection's obs_entity_type.

None

Returns:

Type Description
AnnData

The stored relation modality AnnData.

Raises:

Type Description
KeyError

If relation_key is not present in relations.

ValueError

If the relation is not square over the observation axis.

read(filename) classmethod

Read a Celldega MuData collection from an .h5mu file.

Parameters:

Name Type Description Default
filename str | Path

Path to the .h5mu file.

required

Returns:

Type Description
CelldegaCollection

An instance of the calling class wrapping the loaded MuData

CelldegaCollection

(subclasses restore their own metadata from uns in __init__).

write(filename, **kwargs)

Write the underlying MuData to an .h5mu file.

Parameters:

Name Type Description Default
filename str | Path

Destination path.

required
**kwargs Any

Forwarded to MuData.write.

{}
Note

Only the MuData is persisted. In-memory-only state not stored in mdata — e.g. a NeighborhoodCollection's gdf geometry and memberships — does not round-trip through write/read.