Optimus Library-level metrics
The following table describes the library level metrics of the produced by the Optimus workflow. These are calcuated using custom python scripts available in the warp-tools repository. The Optimus workflow aligns files in shards to parallelize computationally intensive steps. This results in multiple matrix market files and shard-level library metrics.
To produce the library-level metrics here, the combined_mtx.py script combines all the shard-level matrix market files into one raw mtx file. Then, STARsolo is run to filter this matrix to only those barcodes that meet STARsolo's criteria of cells (using the Emptydrops_CR parameter). This matrix is then used as input during h5ad generation, and metrics are calculated from the final h5ad using the custom add_library_tso_doublets.py script.
Metric | Description |
---|---|
nhash_id | The first line of of the metrics CSV echos the NHash ID if specified in the workflow run |
number_of_reads | Total number of reads. |
sequencing_saturation | Proportion of unique molecular identifiers (UMIs) observed relative to the total number of possible UMIs. |
fraction_of_unique_reads_mapped_to_genome | Fraction of unique reads that map to the genome. |
fraction_of_unique_and_multiple_reads_mapped_to_genome | Fraction of both unique and multiple reads that map to the genome. |
fraction_of_reads_with_Q30_bases_in_rna | Fraction of reads with base quality score ≥ Q30 in RNA sequences. |
fraction_of_reads_with_Q30_bases_in_cb_and_umi | Fraction of reads with base quality score ≥ Q30 in cell barcode (CB) and unique molecular identifier (UMI). |
fraction_of_reads_with_valid_barcodes | Fraction of reads with valid cell barcodes. |
reads_mapped_antisense_to_gene | Number of reads mapped antisense to gene regions. |
reads_mapped_confidently_exonic | Number of reads mapped confidently to exonic regions. |
reads_mapped_confidently_to_genome | Number of reads mapped confidently to the genome. |
reads_mapped_confidently_to_intronic_regions | Number of reads mapped confidently to intronic regions. |
reads_mapped_confidently_to_transcriptome | Number of reads mapped confidently to the transcriptome. |
estimated_cells | Estimated number of cells from STARsolo using the Emptydops_CR parameter. |
umis_in_cells | Total number of unique molecular identifiers (UMIs) in cells. |
mean_umi_per_cell | Average number of UMIs per cell. |
median_umi_per_cell | Median number of UMIs per cell. |
unique_reads_in_cells_mapped_to_gene | Number of unique reads in cells mapped to genes. |
fraction_of_unique_reads_in_cells | Fraction of unique reads in cells. |
mean_reads_per_cell | Average number of reads per cell. |
median_reads_per_cell | Median number of reads per cell. |
mean_gene_per_cell | Average number of genes per cell. |
median_gene_per_cell | Median number of genes per cell. |
total_genes_unique_detected | Total number of unique genes detected. |
percent_target | Percentage of target cells. Calculated as: estimated_number_of_cells / barcoded_cell_sample_number_of_expected_cells |
percent_intronic_reads | Percentage of intronic reads. Calculated as: reads_mapped_confidently_to_intronic_regions / number_of_reads |
percent_doublets | Percentage of cells flagged as doublets based on doublet scores calculated from a modified DoubletFinder algorithm. |
keeper_mean_reads_per_cell | Mean reads per cell for cells with >1500 genes or nuclei with >1000 genes, and doublet_score < 0.3. |
keeper_median_genes | Median genes per cell for cells with >1500 genes or nuclei with >1000 genes, and doublet_score < 0.3>. |
keeper_cells | Number of cells with >1500 genes or nuclei with >1000 genes, and doublet score < 0.3. |
percent_keeper | Percentage of keeper cells. Calculated as: keeper_cells / estimated_cells |
percent_usable | Percentage of usable cells. Calculated as: keeper_cells / expected_cells |
frac_tso | Fraction of reads containing TSO sequence. Calculated as the number of reads that have 20 bp or more of TSO Sequence clipped from 5' end/ total number of reads. |