Skip to main content

Slide-seq Count Matrix Overview

danger

The Loom matrix is deprecated and the default matrix is now h5ad.

The Slide-seq pipeline's default count matrix output is a h5ad file generated using AnnData.

It contains the raw bead-by-gene counts, which vary depending on the workflow's count_exons parameter. By default, count_exons is set to true and the output h5ad file will contain whole-gene counts with exon counts in an additional layer.

If the workflow is run with count_exons set to false, the output h5ad file will contain whole-gene counts. Running the workflow in this configuration will cause the h5ad matrix to have fewer columns (bead barcodes) due to the difference in STARsolo counting mode.

You can determine which type of counts are in the h5ad file by looking at the unstructured metadata (the anndata.uns property of the matrix) expression_data_type key (see Table 1 below).

The matrix also contains multiple metrics for both individual bead barcodes (the anndata.obs property of the matrix; Table 2) and individual genes (the anndata.var property of the matrix; Table 3) Table 3. Row attributes (gene metrics)

Table 1. Global attributes

The global attributes (unstuctured metadata) in the h5ad apply to the whole file, not any specific part.

AttributeDetails
expression_data_typeString describing if the pipeline counted whole transcript (exonic and intronic) or only exonic reads determined by the value of the count_exons parameter. By default, count_exons is true and expression_data_type is whole_transcript; if count_exons is false then expression_data_type is exonic.
input_idThe input_id provided to the pipeline as input and listed in the pipeline configuration file. This can be any string, but it's recommended for this to be consistent with any sample metadata.
optimus_output_schema_versionh5ad file spec version used during creation of the h5ad file.
pipeline_versionVersion of the Slide-seq pipeline used to generate the h5ad file.

Table 2. Column attributes (bead barcode metrics)

The bead barcode metrics below are computed using TagSort from the warp-tools repository, with the exception of input_id which is an input to the pipeline.

Bead Barcode MetricsDetails
cell_namesThe unique identifier for each bead based on bead barcodes; identical to CellID.
CellIDThe unique identifier for each bead based on bead barcodes; identical to cell_names.
n_readsThe number of reads associated with this entity. n_reads, like all metrics, are calculated from the Slide-Seq output BAM. Prior to alignment with STARsolo, reads are checked against the whitelist (1 hamming distance). These CB-corrected reads are the input to the STAR aligner. Then, the reads also get CB correction during STAR. For this reason, almost all reads in the aligned BAM have a CB tag and UB tag. Therefore, n_reads represents CB corrected reads, not all reads in the input FASTQ files.
noise_readsNumber of reads that are categorized by 10x Genomics Cell Ranger as "noise". Refers to long polymers, or reads with high numbers of N (ambiguous) nucleotides.
perfect_molecule_barcodesThe number of reads whose molecule barcodes contain no errors.
reads_mapped_exonicThe number of unique reads counted as exon; counted when BAM file's sF tag is assigned to 1 or 3 and the NH:i tag is 1; mitochondrial reads are excluded.
reads_mapped_exonic_asThe number of reads counted as exon in the antisense direction; counted when the BAM file's sF tag is assigned to a 2 or 4 and the NH:i tag is 1; mitochondrial reads are excluded.
reads_mapped_intronicThe number of reads counted as intron; counted when the BAM file's sF tag is assigned to a 5 and the NH:i tag is 1; mitochondrial reads are excluded.
reads_mapped_intronic_asThe number of reads counted as intron in the antisense direction; counted when the BAM file's sF tag is assigned to a 6 and the NH:i tag is 1; mitochondrial reads are excluded.
reads_mapped_uniquelyThe number of reads mapped to a single unambiguous location in the genome; mitochondrial reads are excluded.
reads_mapped_multipleThe number of reads mapped to multiple genomic positions with equal confidence; mitochondrial reads are excluded.
duplicate_readsThe number of duplicate reads.
spliced_readsThe number of reads that overlap splicing junctions.
antisense_readsThe number of reads that are mapped to the antisense strand instead of the transcribed strand.
n_moleculesNumber of molecules corresponding to this entity (only reflects reads with CB and UB tags).
n_fragmentsNumber of fragments corresponding to this entity.
fragments_with_single_read_evidenceThe number of fragments associated with this entity that are observed by only one read.
molecules_with_single_read_evidenceThe number of molecules associated with this entity that are observed by only one read.
perfect_cell_barcodesThe number of reads whose bead barcodes contain no errors.
reads_mapped_intergenicThe number of reads counted as intergenic; counted when the BAM file's sF tag is assigned to a 7 and the NH:i tag is 1.
reads_unmappedThe total number of reads that are unmapped; counted when the BAM file's sF tag is 0.
reads_mapped_too_many_lociThe number of reads that were mapped to too many loci across the genome and as a consequence, are reported unmapped by the aligner.
n_genesThe number of genes detected by this bead.
genes_detected_multiple_observationsThe number of genes that are observed by more than one read in this entity.
molecule_barcode_fraction_bases_above_30_meanThe average fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity.
molecule_barcode_fraction_bases_above_30_varianceThe variance in the fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity.
genomic_reads_fraction_bases_quality_above_30_meanThe average fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity.
genomic_reads_fraction_bases_quality_above_30_varianceThe variance in the fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity.
genomic_read_quality_meanAverage quality of base calls in the genomic reads corresponding to this entity.
genomic_read_quality_varianceVariance in quality of base calls in the genomic reads corresponding to this entity.
reads_per_moleculeThe average number of reads associated with each molecule in this entity.
reads_per_fragmentThe average number of reads associated with each fragment in this entity.
fragments_per_moleculeThe average number of fragments associated with each molecule in this entity.
cell_barcode_fraction_bases_above_30_meanThe average fraction of base calls for the bead barcode sequences that are greater than 30, across molecules.
cell_barcode_fraction_bases_above_30_varianceThe variance of the fraction of base calls for the bead barcode sequences that are greater than 30, across molecules.
n_mitochondrial_genesThe number of mitochondrial genes detected by this bead.
n_mitochondrial_moleculesThe number of molecules from mitochondrial genes detected for this bead.
pct_mitochondrial_moleculesThe percentage of molecules from mitochondrial genes detected for this bead.
input_idThe input_id provided to the pipeline as input and listed in the pipeline configuration file. This can be any string, but it's recommended for this to be consistent with any sample metadata.

Table 3. Row attributes (gene metrics)

The gene metrics below are computed using TagSort from the warp-tools repository except where specified.

Gene MetricsDetails
gene_namesThe unique gene_name provided in the GENCODE GTF; identical to the Gene attribute.
ensembl_idsThe gene_id provided in the GENCODE GTF.
GeneThe unique gene_name provided in the GENCODE GTF; identical to the gene_names attribute.
n_readsThe number of reads associated with this entity. n_reads, like all metrics, are calculated from the Slide-Seq output BAM. Prior to alignment with STARsolo, reads are checked against the whitelist (1 hamming distance). These CB-corrected reads are the input to the STAR aligner. Then, the reads also get CB correction during STAR. For this reason, almost all reads in the aligned BAM have a CB tag and UB tag. Therefore, n_reads represents CB corrected reads, not all reads in the input FASTQ files.
noise_readsThe number of reads that are categorized by 10x Genomics Cell Ranger as "noise". Refers to long polymers, or reads with high numbers of N (ambiguous) nucleotides.
perfect_molecule_barcodesThe number of reads with molecule barcodes that have no errors.
reads_mapped_exonicThe number of unique reads counted as exon; counted when BAM file's sF tag is assigned to 1 or 3 and the NH:i tag is 1; mitochondrial reads are excluded.
reads_mapped_exonic_asThe number of reads counted as exon in the antisense direction; counted when the BAM file's sF tag is assigned to a 2 or 4 and the NH:i tag is 1; mitochondrial reads are excluded.
reads_mapped_intronicThe number of reads counted as intron; counted when the BAM file's sF tag is assigned to a 5 and the NH:i tag is 1; mitochondrial reads are excluded.
reads_mapped_intronic_asThe number of reads counted as intron in the antisense direction; counted when the BAM file's sF tag is assigned to a 6 and the NH:i tag is 1; mitochondrial reads are excluded.
reads_mapped_uniquelyThe number of reads mapped to a single unambiguous location in the genome; mitochondrial reads are excluded.
reads_mapped_multipleThe number of reads mapped to multiple genomic positions with equal confidence; mitochondrial reads are excluded.
duplicate_readsThe number of duplicate reads.
spliced_readsThe number of reads that overlap splicing junctions.
antisense_readsThe number of reads that are mapped to the antisense strand instead of the transcribed strand.
molecule_barcode_fraction_bases_above_30_meanThe average fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity.
molecule_barcode_fraction_bases_above_30_varianceThe variance in the fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity.
genomic_reads_fraction_bases_quality_above_30_meanThe average fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity.
genomic_reads_fraction_bases_quality_above_30_varianceThe variance in the fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity.
genomic_read_quality_meanAverage quality of base calls in the genomic reads corresponding to this entity.
genomic_read_quality_varianceVariance in quality of base calls in the genomic reads corresponding to this entity.
n_moleculesNumber of molecules corresponding to this entity (only reflects reads with CB and UB tags).
n_fragmentsNumber of fragments corresponding to this entity.
reads_per_moleculeThe average number of reads associated with each molecule in this entity.
reads_per_fragmentThe average number of reads associated with each fragment in this entity.
fragments_per_moleculeThe average number of fragments associated with each molecule in this entity.
fragments_with_single_read_evidenceThe number of fragments associated with this entity that are observed by only one read.
molecules_with_single_read_evidenceThe number of molecules associated with this entity that are observed by only one read.
number_cells_detected_multipleThe number of bead barcodes which observe more than one read of this gene.
number_cells_expressingThe number of bead barcodes that detect this gene.

Definitions

  • Bead Barcode: Short nucleotide sequence used to label and distinguish which reads come from each unique bead, allowing for tracking of many beads simultaneously.