Optimus supports data processing for the Human Cell Atlas (HCA) Data Coordination Platform (DCP). Learn more about the DCP at the HCA Data Portal).
All DCP Projects processed with Optimus have matrices containing the standard metrics and counts detailed in the Optimus Count Matrix Overview, but also have additional post-processing to combine project data and incorporate DCP-curated metadata.
This section details matrices produced for the Human Cell Atlas (HCA) Data Coordination Platform (DCP)2.0, which includes matrices processed with Optimus v4.1.7 and later. The DCP is currently reprocessing data generated with earlier Optimus versions and will deprecate previous matrices once reprocessing is complete.
DCP project matrices combine project data from individual library preparations that contain the same species, developmental age, and sequencing technology.
Key differences between the standard Optimus Loom matrix described in the Optimus Count Matrix Overview and the DCP Project matrix include:
- Combined project data in the DCP matrix
- Filtering: to reduce file size, DCP project matrices are in sparse format and minimally filtered so that only cells with 100 molecules or more are retained.
- DCP-curated metadata in the Loom global attributes: see table below.
- input_ids in the global attributes: all input_ids representing each library preparation in the matrix are added as a comma-separated string.
DCP project matrices contain DCP-curated metadata in the Loom global attributes which may be useful when exploring the data and linking it back to the Project metadata.
Read more about each metadata field in the DCP Metadata Dictionary.
|Metadata Attribute Name in Count Matrix||Metadata Description|
|species information; human or mouse|
|technology used for library preparation, i.e 10x or SS2|
|metadata values for |
|metadata values for |
|string describing the DCP-curated metadata field used for input_id: |
|string describing the DCP-curated metadata field used for input_name: |
To create the DCP project matrices, Loom outputs from individual 10x library preparations, each with their own
input_id, are combined into a single Loom file.
Since DCP project matrices often contain combined data from multiple library preparations, the project matrix cell barcodes are modified so that they are unique for each library preparation, allowing the barcodes to be used by downstream community tools like Cumulus and Seurat.
In the standard Optimus matrix, cell barcodes are listed in both the Loom columns
CellID and the
For DCP projects, however, the cell barcodes in the
cell_names column are modified so that each cell barcode belonging to an individual library preparation is unique. This is done by adding a numerical suffix to the barcodes that corresponds to the
input_id for the library preparation from which the cell barcodes came.
input_ids are listed in the matrix global attributes. The order of the input_ids serves as an index for the cell barcode suffix.
Let's use loompy to look at the global attribute
input_id for a DCP project matrix (loom format):
>>> ds.attrs.input_id'166c1b1a-ad9c-4476-a4ec-8b52eb5032c7, 22b7da3d-a301-433e-99e1-e67266c1ee8b, 337a48c5-e363-45aa-886f-ccd4425edc2b, 40630e8b-c3a3-4813-b1e4-b156637c5cc3, 58d703d1-d366-42d0-af44-a3bb836838a5, 70c8d647-7984-4d03-912a-f2437aa1ba4f, 7c86cf30-4284-4a0d-817f-6047560c05c3, 8ef7aca4-be00-4c03-8576-1b2eff4ce7af, ae0cfa6e-e7cb-4a88-9f89-1c44abaa2291, cbd23025-b1bf-4e9e-a297-ddab4a217b76, df049da4-3d20-4da7-a1d7-7d6e8f7740ff, e17bf5ea-788b-4756-a008-a07aec091e10'>>>
Notice the attribute's value is a string of comma- and space- separated UUIDs.
Each of these UUIDs represents one library preparation. This matrix contains data from 12 library preparations total.
Now let's look at the
cell_names column attribute which contains the unique cell barcodes:
>>> ds.ca.cell_namesarray(['GGACAAGAGTGCGTGA-0', 'GATCGATCACCAGGTC-0', 'AGCGGTCAGGGCTTGA-0', ..., 'GTACGTAAGCTATGCT-11', 'CAGAATCTCTGAGTGT-11', 'AACACGTAGTGTTTGC-11'], dtype=object)>>>
The suffix appended to the barcodes in the
cell_names column is the index for the
input_id UUID to which the cell barcodes belong.
For example, cell barcodes with a "-0" suffix belong to the library preparation represented by the first UUID,
166c1b1a-ad9c-4476-a4ec-8b52eb5032c7, whereas cell barcodes with a "-11" suffix represent the 10th UUID,
While the project matrices contain some project metadata (listed in the table above), there is additionally useful metadata in the project's metadata manifest, a TSV file containing all of a project's metadata, including donor and disease state information.
In addition to the global attribute
input_id, each project matrix has an
input_id column that can be useful for mapping matrix data back to the DCP metadata manifest.
The values listed in the
input_id column match the library preparation UUID in the metadata manifest column
Read more about the metadata manifest in the DCP Exploring Projects guide.
Explore HCA Project matrices in Terra
Contributor matrices contain data analyzed and provided by the original project contributors. While they vary in format and content from project to project, they often include cell type annotations and additional metadata such as donor information and cell barcodes.
If the contributor matrix contains donor metadata that matches a field in the project metadata manifest, the matrix can be linked to the DCP-generated project matrix in a two-step process.
First, map the DCP matrix to the metadata manifest using the Loom's
input_idcolumn; this column contains the same library preparation/donor IDs as the project metadata manifest's
Second, map the contributor matrix to the metadata manifest using the contributor matrix column that matches the metadata manifest.
Contributor matrices might contain a column for cell barcodes for each library/preparation donor. These barcodes should match the non-unique barcodes listed in the DCP project matrix, with the exception of cells that might have been filtered out of the Loom matrix due to low UMIs. The Loom's non-unique barcodes are listed in the
For example code showing how to link a contributor matrix to a DCP project matrix, see the Matrix_matching Jupyter Notebook.
If you have any questions related to the contributor matrix and content, reach out to the individual project contributors listed on the Project page.