Optimus Library-level metrics

The following table describes the library level metrics of the produced by the Optimus workflow. These are calcuated using custom python scripts available in the warp-tools repository. The Optimus workflow aligns files in shards to parallelize computationally intensive steps. This results in multiple matrix market files and shard-level library metrics.

To produce the library-level metrics here, the combined_mtx.py script combines all the shard-level matrix market files into one raw mtx file. Then, STARsolo is run to filter this matrix to only those barcodes that meet STARsolo's criteria of cells (using the Emptydrops_CR parameter). This matrix is then used as input during h5ad generation, and metrics are calculated from the final h5ad using the custom add_library_tso_doublets.py script.

Metric	Description
nhash_id	The first line of of the metrics CSV echos the NHash ID if specified in the workflow run
number_of_reads	Total number of reads.
sequencing_saturation	Proportion of unique molecular identifiers (UMIs) observed relative to the total number of possible UMIs.
fraction_of_unique_reads_mapped_to_genome	Fraction of unique reads that map to the genome.
fraction_of_unique_and_multiple_reads_mapped_to_genome	Fraction of both unique and multiple reads that map to the genome.
fraction_of_reads_with_Q30_bases_in_rna	Fraction of reads with base quality score ≥ Q30 in RNA sequences.
fraction_of_reads_with_Q30_bases_in_cb_and_umi	Fraction of reads with base quality score ≥ Q30 in cell barcode (CB) and unique molecular identifier (UMI).
fraction_of_reads_with_valid_barcodes	Fraction of reads with valid cell barcodes.
reads_mapped_antisense_to_gene	Number of reads mapped antisense to gene regions.
reads_mapped_confidently_exonic	Number of reads mapped confidently to exonic regions.
reads_mapped_confidently_to_genome	Number of reads mapped confidently to the genome.
reads_mapped_confidently_to_intronic_regions	Number of reads mapped confidently to intronic regions.
reads_mapped_confidently_to_transcriptome	Number of reads mapped confidently to the transcriptome.
estimated_cells	Estimated number of cells from STARsolo using the Emptydops_CR parameter.
umis_in_cells	Total number of unique molecular identifiers (UMIs) in cells.
mean_umi_per_cell	Average number of UMIs per cell.
median_umi_per_cell	Median number of UMIs per cell.
unique_reads_in_cells_mapped_to_gene	Number of unique reads in cells mapped to genes.
fraction_of_unique_reads_in_cells	Fraction of unique reads in cells.
mean_reads_per_cell	Average number of reads per cell.
median_reads_per_cell	Median number of reads per cell.
mean_gene_per_cell	Average number of genes per cell.
median_gene_per_cell	Median number of genes per cell.
total_genes_unique_detected	Total number of unique genes detected.
percent_target	Percentage of target cells. Calculated as: estimated_number_of_cells / barcoded_cell_sample_number_of_expected_cells
percent_intronic_reads	Percentage of intronic reads. Calculated as: reads_mapped_confidently_to_intronic_regions / number_of_reads
percent_doublets	Percentage of cells flagged as doublets based on doublet scores calculated from a modified DoubletFinder algorithm.
keeper_mean_reads_per_cell	Mean reads per cell for cells with >1500 genes or nuclei with >1000 genes, and doublet_score < 0.3.
keeper_median_genes	Median genes per cell for cells with >1500 genes or nuclei with >1000 genes, and doublet_score < 0.3>.
keeper_cells	Number of cells with >1500 genes or nuclei with >1000 genes, and doublet score < 0.3.
percent_keeper	Percentage of keeper cells. Calculated as: keeper_cells / estimated_cells
percent_usable	Percentage of usable cells. Calculated as: keeper_cells / expected_cells
frac_tso	Fraction of reads containing TSO sequence. Calculated as the number of reads that have 20 bp or more of TSO Sequence clipped from 5' end/ total number of reads.