CelldegaCollection API Reference
A CelldegaCollection is the base Class that is used to build Celldega's dataset-level (DatasetCollection) and neighborhood-level (NeighborhoodCollection) data structures. Celldega collections are typed MuData profiles. AnnData is the unit of a feature space; MuData is the unit of a multimodal Celldega collection.
Motivation
Celldega defines new biological entities — datasets, neighborhoods, and more in
the future — which requires both constructing the entity and calculating its
feature spaces, neither of which is free for entities above the single-cell
level. For single-cell gene expression and spatial data, both come straight off
the instrument; for higher-order entities, DatasetCollection and
NeighborhoodCollection construct the observation axis and attach the feature
modalities themselves.
AnnData is an excellent representation for one observation-by-feature matrix plus aligned annotations, graphs, and metadata. Celldega collections need several independently clusterable feature spaces over the same biological observation axis: genes, populations, image features, morphology features, clinical variables, and derived joint spaces.
MuData provides that collection layer by storing each feature space as its own AnnData modality while preserving shared observation metadata. Celldega adds a thin schema convention on top for biological entity typing, provenance, geometry, and view-linking metadata.
Core Model
A CelldegaCollection is a thin wrapper over a MuData (collection.mdata).
Its core accessors are convenience aliases that proxy directly to attributes of
the underlying MuData — they are not separate storage:
| Concept | Accessor | Is exactly |
|---|---|---|
| Canonical observations | collection.obs |
collection.mdata.obs |
| Feature spaces | collection.mod[name] |
collection.mdata.mod[name] |
| Observation relations | collection.relations[name] |
collection.mdata.obsp[name] |
| Celldega metadata | collection.uns |
collection.mdata.uns["celldega"] |
In particular, collection.relations is collection.mdata.obsp (the same
object): collection.relations["x"] is collection.mdata.obsp["x"]. The
relations name is just Celldega vocabulary for MuData's obsp ("observation
pairwise") store — use whichever you prefer. Relations live in obsp (the
shared, collection-level observation axis) rather than inside a single
modality's obsp because they are modality-independent properties of the
observations themselves; feature-by-feature relations belong in a modality's
varp.
Each modality is a normal AnnData object. Its X is the clusterable matrix and
its var table describes the local feature/entity axis. Celldega stores the
global row entity type in mdata.uns["celldega"]["obs_entity_type"] and stores
modality-local entity types in mdata.mod[name].var["entity_type"].
Higher-order collections do not embed lower-level source objects such as
single-cell AnnData. Source data can be linked through lightweight metadata in
collection.uns["sources"] and recorded in modality provenance.
obsp is the right native location for graph-like or distance-like observation
pairs. When a workflow needs to treat a square relation matrix as AnnData.X
for heatmap or Matrix-style clustering, materialize it as a modality:
# make a new modality from a pre-existing relationship
collection.add_relation_modality("similarity")
# view new modality
collection.mod["similarity_relation"]
API
MuData-backed Celldega collection schema objects.
CelldegaCollection
Base Celldega collection profile backed by MuData.
Celldega defines new biological entities (datasets, neighborhoods, and more
in the future), which requires both constructing the entity and
calculating its feature spaces — neither of which is free for entities
above the single-cell level. For single-cell data both steps come straight
off the instrument; for higher-order entities DatasetCollection and
NeighborhoodCollection build the observation axis and attach the feature
modalities themselves.
Attributes:
| Name | Type | Description |
|---|---|---|
mdata |
The underlying multimodal object. |
|
mod |
dict[str, AnnData]
|
MuData modalities. Each modality is a clusterable |
obs |
DataFrame
|
Canonical biological observation axis shared by modalities. |
relations |
Any
|
Global observation-by-observation relations stored in
|
uns |
dict[str, Any]
|
Celldega schema metadata stored in |
collection_type
property
Celldega collection type, such as "dataset" or "neighborhood".
mod
property
Named feature modalities.
obs
property
writable
Canonical collection observation table.
provenance
property
Collection-level provenance metadata.
relations
property
Global observation-by-observation relations.
This is a named accessor for mdata.obsp (not a separate store):
relations are square matrices over the collection's observation axis,
shared across all modalities. They live here rather than inside a single
modality's obsp because they are properties of the observations
themselves (e.g. neighborhood overlap or bordering, derived from
geometry) and are modality-independent. Feature-by-feature relations
belong in a modality's varp instead.
uns
property
Celldega schema metadata namespace.
__init__(obs=None, mod=None, mdata=None, relations=None, provenance=None, uns=None, collection_type=None, obs_entity_type=None)
Build a collection from an observation table, modalities, or a MuData.
Exactly one of three construction paths is taken, in priority order:
mdatagiven — wrap an existing (e.g. freshly read)MuData; ifobsis also given it replaces the top-level observation table.modgiven (nomdata) — each modality is aligned toobs(or, whenobsis omitted, to the first modality's ownobs) and the modalities become the collection's feature spaces.- neither — an empty collection carrying only the
obsaxis is built; modalities are attached later viaadd_mod/calc_*methods.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obs
|
DataFrame | None
|
Canonical observation table; its index becomes the collection's observation axis. |
None
|
mod
|
dict[str, AnnData] | None
|
Named feature-space modalities to attach, each aligned to |
None
|
mdata
|
MuData | None
|
A pre-built |
None
|
relations
|
dict[str, spmatrix] | None
|
Square observation-by-observation matrices stored in
|
None
|
provenance
|
dict[str, Any] | None
|
Free-form provenance merged into
|
None
|
uns
|
dict[str, Any] | None
|
Extra Celldega metadata merged into |
None
|
collection_type
|
str | None
|
Schema tag such as |
None
|
obs_entity_type
|
str | None
|
Biological type of each observation (e.g.
|
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If none of |
add_mod(key, adata, var_entity_type=None)
Attach a feature-space modality, aligned to the collection axis.
adata is aligned to the collection's obs index via
:func:_align_mod_to_obs (observations missing from adata become
zero-filled rows, extras are dropped, sparse X stays sparse), stored
under key in mod, and the underlying MuData is refreshed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
str
|
Modality name (key in |
required |
adata
|
AnnData
|
Feature matrix to attach; it need not already match the axis. |
required |
var_entity_type
|
str | None
|
If given, written to the stored modality's
|
None
|
Returns:
| Type | Description |
|---|---|
AnnData
|
The aligned |
add_relation_modality(relation_key, key=None, var_entity_type=None)
Materialize a square observation relation as a clusterable modality.
Relations live canonically in mdata.obsp. Use this when a workflow
needs the relation as an AnnData.X matrix — e.g. Matrix-style heatmap
clustering of an observation-by-observation similarity/distance matrix.
The resulting modality is labelled by the observation index on both axes
(var carries a related_obs_id column).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
relation_key
|
str
|
Key of the relation in |
required |
key
|
str | None
|
Name for the new modality; defaults to
|
None
|
var_entity_type
|
str | None
|
Entity type for the modality's |
None
|
Returns:
| Type | Description |
|---|---|
AnnData
|
The stored relation modality |
Raises:
| Type | Description |
|---|---|
KeyError
|
If |
ValueError
|
If the relation is not square over the observation axis. |
read(filename)
classmethod
Read a Celldega MuData collection from an .h5mu file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str | Path
|
Path to the |
required |
Returns:
| Type | Description |
|---|---|
CelldegaCollection
|
An instance of the calling class wrapping the loaded |
CelldegaCollection
|
(subclasses restore their own metadata from |
write(filename, **kwargs)
Write the underlying MuData to an .h5mu file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str | Path
|
Destination path. |
required |
**kwargs
|
Any
|
Forwarded to |
{}
|
Note
Only the MuData is persisted. In-memory-only state not stored in
mdata — e.g. a NeighborhoodCollection's gdf geometry and
memberships — does not round-trip through write/read.