Skip to main content

Optimus Count Matrix Overview

The Optimus pipeline's default count matrix output is a Loom file, an HDF5 file generated using Loompy v.3.0.6.

The matrix contains global attributes detailing how counts were generated for the single-cell or single-nucleus parameters (Table 1). It additionally contains UMI-corrected counts as well as multiple metrics for both individual cells (the columns of the matrix; Table 2) and individual genes (the rows of the matrix; Table 3).

The tables below document these metrics, list which tools generate them, and define them.

Additional Matrix Processing for Consortia

Loom files generated by Optimus for consortia, such as the Human Cell Atlas (HCA) or the BRAIN Initiative Cell Census Network (BICCN), may have additional processing steps. Read the Consortia Processing Overview for details on consortia-specific matrix changes.

Table 1. Global Attributes#

The global attributes in the Loom apply to the whole file, not any specific part. There are two global attributes for the Optimus Loom.

AttributeDetails
optimus_output_schema_versionString with the loom file spec version
expression_data_typeString describing if the pipeline counts exonic or whole transcript (exonic and intronic) reads. For the single-cell mode (counting_mode = sc_rna), the value will be "exonic"; for the single-nucleus mode (counting_mode = sn_rna), the value will be "whole_transcript"
input_idThe sample or cell id listed in the pipeline configuration file. This can be any string, but we recommend it be consistent with any sample metadata.
input_nameOptional string that can be used to further describe the input
input_id_metadata_fieldOptional string that describes, when applicable, the metadata field containing the input_id
input_name_metadata_fieldOptional string that describes, when applicable, the metadata field containing the input_name
pipeline_versionString describing the Optimus version

Table 2. Column Attributes (Cell Metrics)#

Cell MetricsProgramDetails
CellIDSC ToolsThe unique identifier for each cell based on cell barcodes; identical to cell_names.
cell_namesSC ToolsThe unique identifier for each cell based on cell barcodes; identical to CellID.
input_idProvided as pipeline inputThe sample or cell id listed in the pipeline configuration file. This can be any string, but we recommend it be consistent with any sample metadata.
n_readsSC ToolsThe number of reads associated with this entity. Metrics Definitions
noise_readsSC ToolsNumber of reads that are categorized by 10x Genomics Cell Ranger as "noise". Refers to long polymers, or reads with high numbers of N (ambiguous) nucleotides. Metrics Definitions
perfect_molecule_barcodesSC ToolsThe number of reads with molecule barcodes that have no errors. Metrics Definitions
n_mitochondrial_genesSC ToolsThe number of mitochondrial genes detected by this cell. Metrics Definitions
n_mitochondrial_moleculesSC ToolsThe number of molecules from mitochondrial genes detected for this cell. Metrics Definitions
pct_mitochondrial_moleculesSC ToolsThe percentage of molecules from mitochondrial genes detected for this cell. Metrics Definitions
reads_mapped_exonicSC ToolsThe number of reads for this entity that are mapped to exons. Metrics Definitions
reads_mapped_intronicSC ToolsThe number of reads for this entity that are mapped to introns. Metrics Definitions
reads_mapped_utrSC ToolsThe number of reads for this entity that are mapped to 3' untranslated regions (UTRs). Metrics Definitions
reads_mapped_uniquelySC ToolsThe number of reads mapped to a single unambiguous location in the genome. Metrics Definitions
reads_mapped_multipleSC ToolsThe number of reads mapped to multiple genomic positions with equal confidence. Metrics Definitions
duplicate_readsSC ToolsThe number of reads that are duplicates (see README.md for definition of a duplicate). Metrics Definitions
spliced_readsSC ToolsThe number of reads that overlap splicing junctions. Metrics Definitions
antisense_readsSC ToolsThe number of reads that are mapped to the antisense strand instead of the transcribed strand. Metrics Definitions
molecule_barcode_fraction_bases_above_30_meanSC ToolsThe average fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity. Metrics Definitions
molecule_barcode_fraction_bases_above_30_varianceSC ToolsThe variance in the fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity. Metrics Definitions
genomic_reads_fraction_bases_quality_above_30_meanSC ToolsThe average fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity (included for 10x Cell Ranger count comparison). Metrics Definitions
genomic_reads_fraction_bases_quality_above_30_varianceSC ToolsThe variance in the fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity (included for 10x Cell Ranger count comparison). Metrics Definitions
genomic_read_quality_meanSC ToolsAverage quality of Illumina base calls in the genomic reads corresponding to this entity. Metrics Definitions
genomic_read_quality_varianceSC ToolsVariance in quality of Illumina base calls in the genomic reads corresponding to this entity. Metrics Definitions
n_moleculesSC ToolsNumber of molecules corresponding to this entity. See README.md for the definition of a Molecule. Metrics Definitions
n_fragmentsSC ToolsNumber of fragments corresponding to this entity. See README.md for the definition of a Fragment. Metrics Definitions
reads_per_fragmentSC ToolsThe average number of reads associated with each fragment in this entity. Metrics Definitions
fragments_per_moleculeSC ToolsThe average number of fragments associated with each molecule in this entity. Metrics Definitions
fragments_with_single_read_evidenceSC ToolsThe number of fragments associated with this entity that are observed by only one read. Metrics Definitions
molecules_with_single_read_evidenceSC ToolsThe number of molecules associated with this entity that are observed by only one read. Metrics Definitions
perfect_cell_barcodesSC ToolsThe number of reads whose cell barcodes contain no error. Metrics Definitions
reads_mapped_intergenicSC ToolsThe number of reads mapped to an intergenic region for this cell. Metrics Definitions
reads_mapped_too_many_lociSC ToolsThe number of reads that were mapped to too many loci across the genome and as a consequence, are reported unmapped by the aligner. Metrics Definitions
cell_barcode_fraction_bases_above_30_varianceSC ToolsThe variance of the fraction of Illumina base calls for the cell barcode sequence that are greater than 30, across molecules. Metrics Definitions
cell_barcode_fraction_bases_above_30_meanSC ToolsThe average fraction of Illumina base calls for the cell barcode sequences that are greater than 30, across molecules. Metrics Definitions
n_genesSC ToolsThe number of genes detected by this cell. Metrics Definitions
genes_detected_multiple_observationsSC ToolsThe number of genes that are observed by more than one read in this cell. Metrics Definitions
reads_unmappedSC ToolsReads that are non-transcriptomic
emptydrops_FDRdropletUtilsFalse Discovery Rate (FDR) for being a non-empty droplet; single-cell data will read "NA" if task is unable to detect knee point inflection. Column is not included for data run in the sn_rna mode
emptydrops_IsCelldropletUtilsBinarized call of cell/background based on predefined FDR cutoff; single-cell data will read "NA" if task is unable to detect knee point inflection. Column is not included for data run in the sn_rna mode
emptydrops_LimiteddropletUtilsIndicates whether a lower p-value could be obtained by increasing the number of iterations; single-cell data will read "NA" if task is unable to detect knee point inflection. Column is not included for data run in the sn_rna mode
emptydrops_LogProbdropletUtilsThe log-probability of observing the barcode’s count vector under the null model; single-cell data will read "NA" if the task is unable to detect knee point inflection. Column is not included for data run in the sn_rna mode
emptydrops_PValuedropletUtilsNumeric, the Monte Carlo p-value against the null model; single-cell data will read "NA" if task is unable to detect knee point inflection. Column is not included for data run in the sn_rna mode
emptydrops_TotaldropletUtilsNumeric, the total read counts for each barcode; single-cell data will read "NA" if task is unable to detect knee point inflection. Column is not included for data run in the sn_rna mode

Table 3. Row Attributes (Gene Metrics)#

Gene MetricsProgramDetails
ensembl_idsGENCODE GTFThe gene_id listed in the GENCODE GTF.
GeneGENCODE GTFThe unique gene_name provided in the GENCODE GTF; identical to the gene_names attribute.
gene_namesGENCODE GTFThe unique gene_name provided in the GENCODE GTF; identical to the Gene attribute.
n_readsSC ToolsThe number of reads associated with this entity. Metrics Definitions
noise_readsSC ToolsThe number of reads that are categorized by 10x Genomics Cell Ranger as "noise". Refers to long polymers, or reads with high numbers of N (ambiguous) nucleotides. Metrics Definitions
perfect_molecule_barcodesSC ToolsThe number of reads with molecule barcodes that have no errors. Metrics Definitions
reads_mapped_exonicSC ToolsThe number of reads for this entity that are mapped to exons. Metrics Definitions
reads_mapped_intronicSC ToolsThe number of reads for this entity that are mapped to introns. Metrics Definitions
reads_mapped_utrSC ToolsThe number of reads for this entity that are mapped to 3' untranslated regions (UTRs). Metrics Definitions
reads_mapped_uniquelySC ToolsThe number of reads mapped to a single unambiguous location in the genome. Metrics Definitions
reads_mapped_multipleSC ToolsThe number of reads mapped to multiple genomic positions with equal confidence. Metrics Definitions
duplicate_readsSC ToolsThe number of reads that are duplicates (see README.md for definition of a duplicate). Metrics Definitions
spliced_readsSC ToolsThe number of reads that overlap splicing junctions. Metrics Definitions
antisense_readsSC ToolsThe number of reads that are mapped to the antisense strand instead of the transcribed strand. Metrics Definitions
molecule_barcode_fraction_bases_above_30_meanSC ToolsThe average fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity. Metrics Definitions
molecule_barcode_fraction_bases_above_30_varianceSC ToolsThe variance in the fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity. Metrics Definitions
genomic_reads_fraction_bases_quality_above_30_meanSC ToolsThe average fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity (included for 10x Cell Ranger count comparison). Metrics Definitions
genomic_reads_fraction_bases_quality_above_30_varianceSC ToolsThe variance in the fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity (included for 10x Cell Ranger count comparison). Metrics Definitions
genomic_read_quality_meanSC ToolsAverage quality of Illumina base calls in the genomic reads corresponding to this entity. Metrics Definitions
genomic_read_quality_varianceSC ToolsVariance in quality of Illumina base calls in the genomic reads corresponding to this entity. Metrics Definitions
n_moleculesSC ToolsNumber of molecules corresponding to this entity. See README.md for the definition of a Molecule. Metrics Definitions
n_fragmentsSC ToolsNumber of fragments corresponding to this entity. See README.md for the definition of a Fragment. Metrics Definitions
reads_per_moleculeSC ToolsThe average number of reads associated with each molecule in this entity. Metrics Definitions
reads_per_fragmentSC ToolsThe average number of reads associated with each fragment in this entity. Metrics Definitions
fragments_per_moleculeSC ToolsThe average number of fragments associated with each molecule in this entity. Metrics Definitions
fragments_with_single_read_evidenceSC ToolsThe number of fragments associated with this entity that are observed by only one read. Metrics Definitions
molecules_with_single_read_evidenceSC ToolsThe number of molecules associated with this entity that are observed by only one read. Metrics Definitions
number_cells_detected_multipleSC ToolsThe number of cells which observe more than one read of this gene. Metrics Definitions
number_cells_expressingSC ToolsThe number of cells that detect this gene. Metrics Definitions