Skip to main content

STAR Aligner Metrics

The STAR aligner produces multiple text files containing library-level summary metrics, cell-level metrics, and UMI metrics. The Optimus workflow compresses these files into a single TAR. These outputs are directly from the aligner as different batches of the data are analyzed in parallel.

The STAR aligner metrics are supplemental to the library-level metrics CSV that is also produced by Optimus. Several of the calculations produced in the library metrics are directly based on the STAR aligner metrics.

The following sections describe these outputs.

Align Features Metrics

The Align feature text file contains library-level metrics produced by the STARsolo alignment detailing the alignment of reads to genomic features during single-cell RNA-seq analysis. These metrics indicate how well reads map to specific genomic features or whether they failed to map due to various reasons. For example: noUnmapped represents the number of reads that were not aligned to any feature in the genome. noNoFeature reflects reads that were aligned but did not map to any specific feature such as exons or genes. MultiFeature counts reads that were aligned to multiple features. yesWLmatch and yesCellBarcodes track how well reads match the barcode whitelist, an important step in identifying valid cell barcodes, which helps demultiplex the single-cell RNA-seq data​.

Each of the table metrics gives insights into different stages of read alignment, from barcode matching to gene feature mapping, allowing you to assess the quality and accuracy of the alignment step in the pipeline.

Metrics nameDescription
noUnmappedNumber of unmapped reads
noNoFeatureNumber of reads not mapped to a feature.
MultiFeatureNumber of reads aligned to multiple features.
subMultiFeatureMultiGenomicNumber of reads mapping to multiple genomic loci and multiple features.
noTooManyWLmatchesNumber of reads not counted because their barcoded pair has too many matches to the whitelist.
noMMtoWLwithoutExactNumber of reads not counted because their barcoded pair has mismatches to the whitelist and there's no more reads supporting that barcode.
yesWLmatchNumber of reads whose barcoded pair has a match to the whitelist.
yessubWLmatchExactNumber of reads with cell barcode exactly matched to the whitelist (a subset of yesWLmatch).
yessubWLmatch_UniqueFeatureNumber of reads matched to the WL and unique feature (a subset of yesWLmatch).
yesCellBarcodesNumber of reads associated with a valid cell barcode.
yesUMIsNumber of reads associated with a valid UMI.

Cell Read Metrics

The cell read metrics text file provides cell barcode-level information about the reads; for instance: cbMatch counts the number of reads that successfully matched the cell barcode. cbPerfect gives the number of reads with a perfect match to a cell barcode, while cbMMunique and cbMMmultiple measure mismatches that still align uniquely or to multiple barcodes, respectively. genomeU and genomeM count reads mapped to one or multiple loci in the genome, respectively. exonic and intronic track reads mapping to annotated exons or introns, helping distinguish between different gene regions in the analysis.

These metrics are important for assessing the quality of individual cell barcodes.

MetricsDescription
CBCell barcode
cbMatchNumber of reads that matched the cell barcode.
cbPerfectNumber of perfect matches on cell barcode.
cbMMuniqueNumber of reads with cell barcodes that map with mismatches to one barcode in the passlist.
cbMMmultipleNumber of reads with cell barcodes that map with mismatches to multiple barcodes in the passlist.
genomeUNumber of reads mapping to one locus in the genome.
genomeMNumber of reads mapping to multiple loci in the genome.
featureUNumber of reads mapping to one feature (Gene, GeneFull, etc).
featureMNumber of reads mapping to multiple features.
exonicNumber of reads mapping to annotated exons.
intronicNumber of reads mapping to annotated introns; these are only calculated for --soloFeatures GeneFull_Ex50pAS and/or GeneFull_ExonOverIntron.
exonicASNumber of reads mapping antisense to annotated exons.
intronicASNumber of reads mapping antisense to annotated introns; these are only calculated for --soloFeatures GeneFull_Ex50pAS.
mitoNumber of reads mapping to the mitochondrial genome.
countedUNumber of unique-gene reads whose UMIs contributed to counts in the matrix.mtx (eads with valid CB/UMI/gene).
countedMNumber of multi-gene reads whose UMIs contributed to counts in the matrix.mtx.
nUMIuniqueTotal number of counted UMI for unique-gene reads.
nGenesUniqueNumber of genes for unique-gene reads.
nUMImultiTotal number of counted UMI for multi-gene reads.
nGenesMultiNumber of genes for multi-gene reads.

Summary.txt

The summary text file contains additional library-level metrics produced by the STARsolo aligner, such as:
Number of reads, which reflects the total reads processed, and reads with valid barcodes, which indicates how many reads matched the barcode whitelist. Sequencing saturation shows the completeness of sequencing, where higher values indicate fewer additional reads are needed to capture new UMIs. Metrics like Q30 Bases in CB+UMI and Q30 Bases in RNA read give insights into sequencing quality, showing how many reads had high-quality base calls. Other key metrics, such as reads mapped to the genome: Unique+Multiple and estimated number of cells, provide a sense of how well reads were mapped to the genome and how many cells were identified. These summary metrics help users assess the overall quality and completeness of their single-cell RNA-seq data, serving as a useful checkpoint for determining whether the data is suitable for further analysis.

MetricDescription
Number of ReadsNumber of reads in the library.
Reads With Valid BarcodesFraction of reads with valid barcodes.
Sequencing SaturationProportion of unique molecular identifiers (UMIs) that have been sequenced at least once compared to the total number of possible UMIs in the sample; calculated as: 1-(yesUMIs/yessubWLmatch_UniqueFeature).
Q30 Bases in CB+UMIFraction of high-quality reads in the cell barcode and UMI read.
Q30 Bases in RNA readFraction of high-quality reads in the RNA read.
Reads Mapped to Genome: Unique+MultipleFraction of unique and multimapped reads that mapped to the genome.
Reads Mapped to Genome: UniqueFraction of unique reads that mapped to the genome.
Reads Mapped to genes: Unique+MultipleFraction of reads that mapped to genes as defined by the –solo-feature parameter.
Reads Mapped to Genes: UniqueFraction of unique reads that mapped to genes.
Estimated Number of CellsNumber of barcodes that STARsolo flagged as cells based on UMIs.
Unique Reads in Cells Mapped to genesTotal number of unique reads that mapped to genes across all cells
Fraction of Unique Reads in CellsFraction of unique reads across all cells.
Mean Reads per CellMean number of reads per cell.
Median Reads per CellMedian number of reads per cell.
UMIs in CellsNumber of UMIs per cell.
Mean UMI per CellMean number of UMIs per cell.
Median UMI per CellMedian number of UMI per cell.
Mean Genes per CellMean number of genes expressed per cell.
Median Genes per CellMedian number of genes per cell.
Total Genes DetectedTotal number of genes detected in the overall library.

UMI per cell

The UMI per cell text file is a list of UMI counts per every cell. It contains two columns. The first column contains the number of UMIs per each barcode entry. The second column indicates whether a barcode was flagged as a cell. A 1 indicates that it passed filtering criteria to be considered a cell and 0 indicates that it did not pass.