Skip to main content

Slide-seq Count Matrix Overview

The Slide-seq pipeline's default count matrix output is a Loom file generated using Loompy v.3.0.6.

It contains the raw bead-by-gene counts, which vary depending on the workflow's count_exons parameter. By default, count_exons is set to true and the output Loom will contain whole-gene counts with exon counts in an additional layer.

If the workflow is run with count_exons set to false, the output Loom file will contain whole-gene counts. Running the workflow in this configuration will cause the Loom matrix to have fewer columns (bead barcodes) due to the difference in STARsolo counting mode.

You can determine which type of counts are in the Loom by looking at the global attribute expression_data_type (see Table 1 below).

The matrix also contains multiple metrics for both individual bead barcodes (the columns of the matrix; Table 2) and individual genes (the rows of the matrix; Table 3).

Table 1. Global attributes

The global attributes in the Loom apply to the whole file, not any specific part.

AttributeDetails
CreationDateDate the Loom file was created.
LOOM_SPEC_VERSIONLoom file spec version used during creation of the Loom file.
expression_data_typeString describing if the pipeline counted whole transcript (exonic and intronic) or only exonic reads determined by the value of the count_exons parameter. By default, count_exons is true and expression_data_type is whole_transcript; if count_exons is false then expression_data_type is exonic.
input_idThe input_id provided to the pipeline as input and listed in the pipeline configuration file. This can be any string, but it's recommended for this to be consistent with any sample metadata.
optimus_output_schema_versionLoom file spec version used during creation of the Loom file.
pipeline_versionVersion of the Slide-seq pipeline used to generate the Loom file.

Table 2. Column attributes (bead barcode metrics)

The bead barcode metrics below are computed using TagSort from the warp-tools repository, with the exception of input_id which is an input to the pipeline.

Bead Barcode MetricsDetails
CellIDThe unique identifier for each bead based on bead barcodes; identical to cell_names.
antisense_readsThe number of reads that are mapped to the antisense strand instead of the transcribed strand.
cell_barcode_fraction_bases_above_30_meanThe average fraction of base calls for the bead barcode sequences that are greater than 30, across molecules.
cell_barcode_fraction_bases_above_30_varianceThe variance of the fraction of base calls for the bead barcode sequences that are greater than 30, across molecules.
cell_namesThe unique identifier for each bead based on bead barcodes; identical to CellID.
fragments_per_moleculeThe average number of fragments associated with each molecule in this entity.
fragments_with_single_read_evidenceThe number of fragments associated with this entity that are observed by only one read.
genes_detected_multiple_observationsThe number of genes that are observed by more than one read in this entity.
genomic_read_quality_meanAverage quality of base calls in the genomic reads corresponding to this entity.
genomic_read_quality_varianceVariance in quality of base calls in the genomic reads corresponding to this entity.
genomic_reads_fraction_bases_quality_above_30_meanThe average fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity.
genomic_reads_fraction_bases_quality_above_30_varianceThe variance in the fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity.
input_idThe input_id provided to the pipeline as input and listed in the pipeline configuration file. This can be any string, but it's recommended for this to be consistent with any sample metadata.
molecule_barcode_fraction_bases_above_30_meanThe average fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity.
molecule_barcode_fraction_bases_above_30_varianceThe variance in the fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity.
molecules_with_single_read_evidenceThe number of molecules associated with this entity that are observed by only one read.
n_fragmentsNumber of fragments corresponding to this entity.
n_genesThe number of genes detected by this bead.
n_mitochondrial_genesThe number of mitochondrial genes detected by this bead.
n_mitochondrial_moleculesThe number of molecules from mitochondrial genes detected for this bead.
n_moleculesNumber of molecules corresponding to this entity (only reflects reads with CB and UB tags).
n_readsThe number of reads associated with this entity. n_reads, like all metrics, are calculated from the Optimus output BAM. Prior to alignment with STARsolo, reads are checked against the whitelist (1 hamming distance). These CB-corrected reads are the input to the STAR aligner. Then, the reads also get CB correction during STAR. For this reason, almost all reads in the aligned BAM have a CB tag and UB tag. Therefore, n_reads represents CB corrected reads, not all reads in the input FASTQ files.
noise_readsNumber of reads that are categorized by 10x Genomics Cell Ranger as "noise". Refers to long polymers, or reads with high numbers of N (ambiguous) nucleotides.
pct_mitochondrial_moleculesThe percentage of molecules from mitochondrial genes detected for this bead.
perfect_cell_barcodesThe number of reads whose bead barcodes contain no errors.
perfect_molecule_barcodesThe number of reads whose molecule barcodes contain no errors.
reads_mapped_multipleThe number of reads mapped to multiple genomic positions with equal confidence.
reads_mapped_too_many_lociThe number of reads that were mapped to too many loci across the genome and as a consequence, are reported unmapped by the aligner.
reads_mapped_uniquelyThe number of reads mapped to a single unambiguous location in the genome.
reads_per_fragmentThe average number of reads associated with each fragment in this entity.
spliced_readsThe number of reads that overlap splicing junctions.

Table 3. Row attributes (gene metrics)

The gene metrics below are computed using TagSort from the warp-tools repository except where specified.

Gene MetricsDetails
GeneThe unique gene_name provided in the GENCODE GTF; identical to the gene_names attribute.
antisense_readsThe number of reads that are mapped to the antisense strand instead of the transcribed strand.
ensembl_idsThe gene_id provided in the GENCODE GTF.
fragments_per_moleculeThe average number of fragments associated with each molecule in this entity.
fragments_with_single_read_evidenceThe number of fragments associated with this entity that are observed by only one read.
gene_namesThe unique gene_name provided in the GENCODE GTF; identical to the Gene attribute.
genomic_read_quality_meanAverage quality of base calls in the genomic reads corresponding to this entity.
genomic_read_quality_varianceVariance in quality of base calls in the genomic reads corresponding to this entity.
genomic_reads_fraction_bases_quality_above_30_meanThe average fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity.
genomic_reads_fraction_bases_quality_above_30_varianceThe variance in the fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity.
molecule_barcode_fraction_bases_above_30_meanThe average fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity.
molecule_barcode_fraction_bases_above_30_varianceThe variance in the fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity.
molecules_with_single_read_evidenceThe number of molecules associated with this entity that are observed by only one read.
n_fragmentsNumber of fragments corresponding to this entity.
n_moleculesNumber of molecules corresponding to this entity (only reflects reads with CB and UB tags).
n_readsThe number of reads associated with this entity. n_reads, like all metrics, are calculated from the Optimus output BAM. Prior to alignment with STARsolo, reads are checked against the whitelist (1 hamming distance). These CB-corrected reads are the input to the STAR aligner. Then, the reads also get CB correction during STAR. For this reason, almost all reads in the aligned BAM have a CB tag and UB tag. Therefore, n_reads represents CB corrected reads, not all reads in the input FASTQ files.
noise_readsThe number of reads that are categorized by 10x Genomics Cell Ranger as "noise". Refers to long polymers, or reads with high numbers of N (ambiguous) nucleotides.
number_cells_detected_multipleThe number of bead barcodes which observe more than one read of this gene.
number_cells_expressingThe number of bead barcodes that detect this gene.
perfect_molecule_barcodesThe number of reads with molecule barcodes that have no errors.
reads_mapped_multipleThe number of reads mapped to multiple genomic positions with equal confidence.
reads_mapped_uniquelyThe number of reads mapped to a single unambiguous location in the genome.
reads_per_fragmentThe average number of reads associated with each fragment in this entity.
reads_per_moleculeThe average number of reads associated with each molecule in this entity.
spliced_readsThe number of reads that overlap splicing junctions.