Skip to main content

Optimus Library-level metrics

The following table describes the library level metrics of the produced by the Optimus workflow. These are calcuated using custom python scripts available in the warp-tools repository. The Optimus workflow aligns files in shards to parallelize computationally intensive steps. This results in multiple matrix market files and shard-level library metrics.

To produce the library-level metrics here, the combined_mtx.py script combines all the shard-level matrix market files into one raw mtx file. Then, STARsolo is run to filter this matrix to only those barcodes that meet STARsolo's criteria of cells (using the Emptydrops_CR parameter). This matrix is then used as input during h5ad generation, and metrics are calculated from the final h5ad using the custom add_library_tso_doublets.py script.

MetricDescription
nhash_idThe first line of of the metrics CSV echos the NHash ID if specified in the workflow run
number_of_readsTotal number of reads.
sequencing_saturationProportion of unique molecular identifiers (UMIs) observed relative to the total number of possible UMIs.
fraction_of_unique_reads_mapped_to_genomeFraction of unique reads that map to the genome.
fraction_of_unique_and_multiple_reads_mapped_to_genomeFraction of both unique and multiple reads that map to the genome.
fraction_of_reads_with_Q30_bases_in_rnaFraction of reads with base quality score ≥ Q30 in RNA sequences.
fraction_of_reads_with_Q30_bases_in_cb_and_umiFraction of reads with base quality score ≥ Q30 in cell barcode (CB) and unique molecular identifier (UMI).
fraction_of_reads_with_valid_barcodesFraction of reads with valid cell barcodes.
reads_mapped_antisense_to_geneNumber of reads mapped antisense to gene regions.
reads_mapped_confidently_exonicNumber of reads mapped confidently to exonic regions.
reads_mapped_confidently_to_genomeNumber of reads mapped confidently to the genome.
reads_mapped_confidently_to_intronic_regionsNumber of reads mapped confidently to intronic regions.
reads_mapped_confidently_to_transcriptomeNumber of reads mapped confidently to the transcriptome.
estimated_cellsEstimated number of cells from STARsolo using the Emptydops_CR parameter.
umis_in_cellsTotal number of unique molecular identifiers (UMIs) in cells.
mean_umi_per_cellAverage number of UMIs per cell.
median_umi_per_cellMedian number of UMIs per cell.
unique_reads_in_cells_mapped_to_geneNumber of unique reads in cells mapped to genes.
fraction_of_unique_reads_in_cellsFraction of unique reads in cells.
mean_reads_per_cellAverage number of reads per cell.
median_reads_per_cellMedian number of reads per cell.
mean_gene_per_cellAverage number of genes per cell.
median_gene_per_cellMedian number of genes per cell.
total_genes_unique_detectedTotal number of unique genes detected.
percent_targetPercentage of target cells. Calculated as: estimated_number_of_cells / barcoded_cell_sample_number_of_expected_cells
percent_intronic_readsPercentage of intronic reads. Calculated as: reads_mapped_confidently_to_intronic_regions / number_of_reads
percent_doubletsPercentage of cells flagged as doublets based on doublet scores calculated from a modified DoubletFinder algorithm.
keeper_mean_reads_per_cellMean reads per cell for cells with >1500 genes or nuclei with >1000 genes, and doublet_score < 0.3.
keeper_median_genesMedian genes per cell for cells with >1500 genes or nuclei with >1000 genes, and doublet_score < 0.3>.
keeper_cellsNumber of cells with >1500 genes or nuclei with >1000 genes, and doublet score < 0.3.
percent_keeperPercentage of keeper cells. Calculated as: keeper_cells / estimated_cells
percent_usablePercentage of usable cells. Calculated as: keeper_cells / expected_cells
frac_tsoFraction of reads containing TSO sequence. Calculated as the number of reads that have 20 bp or more of TSO Sequence clipped from 5' end/ total number of reads.