STAR Aligner Metrics

The STAR aligner produces multiple text files containing library-level summary metrics, cell-level metrics, and UMI metrics. The Optimus workflow compresses these files into a single TAR. These outputs are directly from the aligner as different batches of the data are analyzed in parallel.

The STAR aligner metrics are supplemental to the library-level metrics CSV that is also produced by Optimus. Several of the calculations produced in the library metrics are directly based on the STAR aligner metrics.

The following sections describe these outputs.

Align Features Metrics

The Align feature text file contains library-level metrics produced by the STARsolo alignment detailing the alignment of reads to genomic features during single-cell RNA-seq analysis. These metrics indicate how well reads map to specific genomic features or whether they failed to map due to various reasons. For example: noUnmapped represents the number of reads that were not aligned to any feature in the genome. noNoFeature reflects reads that were aligned but did not map to any specific feature such as exons or genes. MultiFeature counts reads that were aligned to multiple features. yesWLmatch and yesCellBarcodes track how well reads match the barcode whitelist, an important step in identifying valid cell barcodes, which helps demultiplex the single-cell RNA-seq data.

Each of the table metrics gives insights into different stages of read alignment, from barcode matching to gene feature mapping, allowing you to assess the quality and accuracy of the alignment step in the pipeline.

Metrics name	Description
noUnmapped	Number of unmapped reads
noNoFeature	Number of reads not mapped to a feature.
MultiFeature	Number of reads aligned to multiple features.
subMultiFeatureMultiGenomic	Number of reads mapping to multiple genomic loci and multiple features.
noTooManyWLmatches	Number of reads not counted because their barcoded pair has too many matches to the whitelist.
noMMtoWLwithoutExact	Number of reads not counted because their barcoded pair has mismatches to the whitelist and there's no more reads supporting that barcode.
yesWLmatch	Number of reads whose barcoded pair has a match to the whitelist.
yessubWLmatchExact	Number of reads with cell barcode exactly matched to the whitelist (a subset of yesWLmatch).
yessubWLmatch_UniqueFeature	Number of reads matched to the WL and unique feature (a subset of yesWLmatch).
yesCellBarcodes	Number of reads associated with a valid cell barcode.
yesUMIs	Number of reads associated with a valid UMI.

Cell Read Metrics

The cell read metrics text file provides cell barcode-level information about the reads; for instance: cbMatch counts the number of reads that successfully matched the cell barcode. cbPerfect gives the number of reads with a perfect match to a cell barcode, while cbMMunique and cbMMmultiple measure mismatches that still align uniquely or to multiple barcodes, respectively. genomeU and genomeM count reads mapped to one or multiple loci in the genome, respectively. exonic and intronic track reads mapping to annotated exons or introns, helping distinguish between different gene regions in the analysis.

These metrics are important for assessing the quality of individual cell barcodes.

Metrics	Description
CB	Cell barcode
cbMatch	Number of reads that matched the cell barcode.
cbPerfect	Number of perfect matches on cell barcode.
cbMMunique	Number of reads with cell barcodes that map with mismatches to one barcode in the passlist.
cbMMmultiple	Number of reads with cell barcodes that map with mismatches to multiple barcodes in the passlist.
genomeU	Number of reads mapping to one locus in the genome.
genomeM	Number of reads mapping to multiple loci in the genome.
featureU	Number of reads mapping to one feature (Gene, GeneFull, etc).
featureM	Number of reads mapping to multiple features.
exonic	Number of reads mapping to annotated exons.
intronic	Number of reads mapping to annotated introns; these are only calculated for --soloFeatures GeneFull_Ex50pAS and/or GeneFull_ExonOverIntron.
exonicAS	Number of reads mapping antisense to annotated exons.
intronicAS	Number of reads mapping antisense to annotated introns; these are only calculated for --soloFeatures GeneFull_Ex50pAS.
mito	Number of reads mapping to the mitochondrial genome.
countedU	Number of unique-gene reads whose UMIs contributed to counts in the matrix.mtx (eads with valid CB/UMI/gene).
countedM	Number of multi-gene reads whose UMIs contributed to counts in the matrix.mtx.
nUMIunique	Total number of counted UMI for unique-gene reads.
nGenesUnique	Number of genes for unique-gene reads.
nUMImulti	Total number of counted UMI for multi-gene reads.
nGenesMulti	Number of genes for multi-gene reads.

Summary.txt

The summary text file contains additional library-level metrics produced by the STARsolo aligner, such as:
Number of reads, which reflects the total reads processed, and reads with valid barcodes, which indicates how many reads matched the barcode whitelist. Sequencing saturation shows the completeness of sequencing, where higher values indicate fewer additional reads are needed to capture new UMIs. Metrics like Q30 Bases in CB+UMI and Q30 Bases in RNA read give insights into sequencing quality, showing how many reads had high-quality base calls. Other key metrics, such as reads mapped to the genome: Unique+Multiple and estimated number of cells, provide a sense of how well reads were mapped to the genome and how many cells were identified. These summary metrics help users assess the overall quality and completeness of their single-cell RNA-seq data, serving as a useful checkpoint for determining whether the data is suitable for further analysis.

Metric	Description
Number of Reads	Number of reads in the library.
Reads With Valid Barcodes	Fraction of reads with valid barcodes.
Sequencing Saturation	Proportion of unique molecular identifiers (UMIs) that have been sequenced at least once compared to the total number of possible UMIs in the sample; calculated as: 1-(yesUMIs/yessubWLmatch_UniqueFeature).
Q30 Bases in CB+UMI	Fraction of high-quality reads in the cell barcode and UMI read.
Q30 Bases in RNA read	Fraction of high-quality reads in the RNA read.
Reads Mapped to Genome: Unique+Multiple	Fraction of unique and multimapped reads that mapped to the genome.
Reads Mapped to Genome: Unique	Fraction of unique reads that mapped to the genome.
Reads Mapped to genes: Unique+Multiple	Fraction of reads that mapped to genes as defined by the –solo-feature parameter.
Reads Mapped to Genes: Unique	Fraction of unique reads that mapped to genes.
Estimated Number of Cells	Number of barcodes that STARsolo flagged as cells based on UMIs.
Unique Reads in Cells Mapped to genes	Total number of unique reads that mapped to genes across all cells
Fraction of Unique Reads in Cells	Fraction of unique reads across all cells.
Mean Reads per Cell	Mean number of reads per cell.
Median Reads per Cell	Median number of reads per cell.
UMIs in Cells	Number of UMIs per cell.
Mean UMI per Cell	Mean number of UMIs per cell.
Median UMI per Cell	Median number of UMI per cell.
Mean Genes per Cell	Mean number of genes expressed per cell.
Median Genes per Cell	Median number of genes per cell.
Total Genes Detected	Total number of genes detected in the overall library.

UMI per cell

The UMI per cell text file is a list of UMI counts per every cell. It contains two columns. The first column contains the number of UMIs per each barcode entry. The second column indicates whether a barcode was flagged as a cell. A 1 indicates that it passed filtering criteria to be considered a cell and 0 indicates that it did not pass.

STAR Aligner Metrics

Align Features Metrics​

Cell Read Metrics​

Summary.txt​

UMI per cell​

Align Features Metrics

Cell Read Metrics

Summary.txt

UMI per cell