Slide-seq Count Matrix Overview

danger

The Loom matrix is deprecated and the default matrix is now h5ad.

The Slide-seq pipeline's default count matrix output is a h5ad file generated using AnnData.

It contains the raw bead-by-gene counts, which vary depending on the workflow's count_exons parameter. By default, count_exons is set to true and the output h5ad file will contain whole-gene counts with exon counts in an additional layer.

If the workflow is run with count_exons set to false, the output h5ad file will contain whole-gene counts. Running the workflow in this configuration will cause the h5ad matrix to have fewer columns (bead barcodes) due to the difference in STARsolo counting mode.

You can determine which type of counts are in the h5ad file by looking at the unstructured metadata (the anndata.uns property of the matrix) expression_data_type key (see Table 1 below).

The matrix also contains multiple metrics for both individual bead barcodes (the anndata.obs property of the matrix; Table 2) and individual genes (the anndata.var property of the matrix; Table 3) Table 3. Row attributes (gene metrics)

Table 1. Global attributes

The global attributes (unstuctured metadata) in the h5ad apply to the whole file, not any specific part.

Attribute	Details
`expression_data_type`	String describing if the pipeline counted whole transcript (exonic and intronic) or only exonic reads determined by the value of the `count_exons` parameter. By default, `count_exons` is `true` and `expression_data_type` is `whole_transcript`; if `count_exons` is `false` then `expression_data_type` is `exonic`.
`input_id`	The `input_id` provided to the pipeline as input and listed in the pipeline configuration file. This can be any string, but it's recommended for this to be consistent with any sample metadata.
`optimus_output_schema_version`	h5ad file spec version used during creation of the h5ad file.
`pipeline_version`	Version of the Slide-seq pipeline used to generate the h5ad file.

Table 2. Column attributes (bead barcode metrics)

The bead barcode metrics below are computed using TagSort from the warp-tools repository, with the exception of input_id which is an input to the pipeline.

Bead Barcode Metrics	Details
`cell_names`	The unique identifier for each bead based on bead barcodes; identical to `CellID`.
`CellID`	The unique identifier for each bead based on bead barcodes; identical to `cell_names`.
`n_reads`	The number of reads associated with this entity. n_reads, like all metrics, are calculated from the Slide-Seq output BAM. Prior to alignment with STARsolo, reads are checked against the whitelist (1 hamming distance). These CB-corrected reads are the input to the STAR aligner. Then, the reads also get CB correction during STAR. For this reason, almost all reads in the aligned BAM have a CB tag and UB tag. Therefore, n_reads represents CB corrected reads, not all reads in the input FASTQ files.
`noise_reads`	Number of reads that are categorized by 10x Genomics Cell Ranger as "noise". Refers to long polymers, or reads with high numbers of N (ambiguous) nucleotides.
`perfect_molecule_barcodes`	The number of reads whose molecule barcodes contain no errors.
`reads_mapped_exonic`	The number of unique reads counted as exon; counted when BAM file's `sF` tag is assigned to `1` or `3` and the `NH:i` tag is `1`; mitochondrial reads are excluded.
`reads_mapped_exonic_as`	The number of reads counted as exon in the antisense direction; counted when the BAM file's `sF` tag is assigned to a `2` or `4` and the `NH:i` tag is `1`; mitochondrial reads are excluded.
`reads_mapped_intronic`	The number of reads counted as intron; counted when the BAM file's `sF` tag is assigned to a `5` and the `NH:i` tag is `1`; mitochondrial reads are excluded.
`reads_mapped_intronic_as`	The number of reads counted as intron in the antisense direction; counted when the BAM file's `sF` tag is assigned to a `6` and the `NH:i` tag is `1`; mitochondrial reads are excluded.
`reads_mapped_uniquely`	The number of reads mapped to a single unambiguous location in the genome; mitochondrial reads are excluded.
`reads_mapped_multiple`	The number of reads mapped to multiple genomic positions with equal confidence; mitochondrial reads are excluded.
`duplicate_reads`	The number of duplicate reads.
`spliced_reads`	The number of reads that overlap splicing junctions.
`antisense_reads`	The number of reads that are mapped to the antisense strand instead of the transcribed strand.
`n_molecules`	Number of molecules corresponding to this entity (only reflects reads with CB and UB tags).
`n_fragments`	Number of fragments corresponding to this entity.
`fragments_with_single_read_evidence`	The number of fragments associated with this entity that are observed by only one read.
`molecules_with_single_read_evidence`	The number of molecules associated with this entity that are observed by only one read.
`perfect_cell_barcodes`	The number of reads whose bead barcodes contain no errors.
`reads_mapped_intergenic`	The number of reads counted as intergenic; counted when the BAM file's `sF` tag is assigned to a `7` and the `NH:i` tag is `1`.
`reads_unmapped`	The total number of reads that are unmapped; counted when the BAM file's `sF` tag is `0`.
`reads_mapped_too_many_loci`	The number of reads that were mapped to too many loci across the genome and as a consequence, are reported unmapped by the aligner.
`n_genes`	The number of genes detected by this bead.
`genes_detected_multiple_observations`	The number of genes that are observed by more than one read in this entity.
`molecule_barcode_fraction_bases_above_30_mean`	The average fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity.
`molecule_barcode_fraction_bases_above_30_variance`	The variance in the fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity.
`genomic_reads_fraction_bases_quality_above_30_mean`	The average fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity.
`genomic_reads_fraction_bases_quality_above_30_variance`	The variance in the fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity.
`genomic_read_quality_mean`	Average quality of base calls in the genomic reads corresponding to this entity.
`genomic_read_quality_variance`	Variance in quality of base calls in the genomic reads corresponding to this entity.
`reads_per_molecule`	The average number of reads associated with each molecule in this entity.
`reads_per_fragment`	The average number of reads associated with each fragment in this entity.
`fragments_per_molecule`	The average number of fragments associated with each molecule in this entity.
`cell_barcode_fraction_bases_above_30_mean`	The average fraction of base calls for the bead barcode sequences that are greater than 30, across molecules.
`cell_barcode_fraction_bases_above_30_variance`	The variance of the fraction of base calls for the bead barcode sequences that are greater than 30, across molecules.
`n_mitochondrial_genes`	The number of mitochondrial genes detected by this bead.
`n_mitochondrial_molecules`	The number of molecules from mitochondrial genes detected for this bead.
`pct_mitochondrial_molecules`	The percentage of molecules from mitochondrial genes detected for this bead.
`input_id`	The `input_id` provided to the pipeline as input and listed in the pipeline configuration file. This can be any string, but it's recommended for this to be consistent with any sample metadata.

Table 3. Row attributes (gene metrics)

The gene metrics below are computed using TagSort from the warp-tools repository except where specified.

Gene Metrics	Details
`gene_names`	The unique `gene_name` provided in the GENCODE GTF; identical to the `Gene` attribute.
`ensembl_ids`	The `gene_id` provided in the GENCODE GTF.
`Gene`	The unique `gene_name` provided in the GENCODE GTF; identical to the `gene_names` attribute.
`n_reads`	The number of reads associated with this entity. n_reads, like all metrics, are calculated from the Slide-Seq output BAM. Prior to alignment with STARsolo, reads are checked against the whitelist (1 hamming distance). These CB-corrected reads are the input to the STAR aligner. Then, the reads also get CB correction during STAR. For this reason, almost all reads in the aligned BAM have a CB tag and UB tag. Therefore, n_reads represents CB corrected reads, not all reads in the input FASTQ files.
`noise_reads`	The number of reads that are categorized by 10x Genomics Cell Ranger as "noise". Refers to long polymers, or reads with high numbers of N (ambiguous) nucleotides.
`perfect_molecule_barcodes`	The number of reads with molecule barcodes that have no errors.
`reads_mapped_exonic`	The number of unique reads counted as exon; counted when BAM file's `sF` tag is assigned to `1` or `3` and the `NH:i` tag is `1`; mitochondrial reads are excluded.
`reads_mapped_exonic_as`	The number of reads counted as exon in the antisense direction; counted when the BAM file's `sF` tag is assigned to a `2` or `4` and the `NH:i` tag is `1`; mitochondrial reads are excluded.
`reads_mapped_intronic`	The number of reads counted as intron; counted when the BAM file's `sF` tag is assigned to a `5` and the `NH:i` tag is `1`; mitochondrial reads are excluded.
`reads_mapped_intronic_as`	The number of reads counted as intron in the antisense direction; counted when the BAM file's `sF` tag is assigned to a `6` and the `NH:i` tag is `1`; mitochondrial reads are excluded.
`reads_mapped_uniquely`	The number of reads mapped to a single unambiguous location in the genome; mitochondrial reads are excluded.
`reads_mapped_multiple`	The number of reads mapped to multiple genomic positions with equal confidence; mitochondrial reads are excluded.
`duplicate_reads`	The number of duplicate reads.
`spliced_reads`	The number of reads that overlap splicing junctions.
`antisense_reads`	The number of reads that are mapped to the antisense strand instead of the transcribed strand.
`molecule_barcode_fraction_bases_above_30_mean`	The average fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity.
`molecule_barcode_fraction_bases_above_30_variance`	The variance in the fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity.
`genomic_reads_fraction_bases_quality_above_30_mean`	The average fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity.
`genomic_reads_fraction_bases_quality_above_30_variance`	The variance in the fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity.
`genomic_read_quality_mean`	Average quality of base calls in the genomic reads corresponding to this entity.
`genomic_read_quality_variance`	Variance in quality of base calls in the genomic reads corresponding to this entity.
`n_molecules`	Number of molecules corresponding to this entity (only reflects reads with CB and UB tags).
`n_fragments`	Number of fragments corresponding to this entity.
`reads_per_molecule`	The average number of reads associated with each molecule in this entity.
`reads_per_fragment`	The average number of reads associated with each fragment in this entity.
`fragments_per_molecule`	The average number of fragments associated with each molecule in this entity.
`fragments_with_single_read_evidence`	The number of fragments associated with this entity that are observed by only one read.
`molecules_with_single_read_evidence`	The number of molecules associated with this entity that are observed by only one read.
`number_cells_detected_multiple`	The number of bead barcodes which observe more than one read of this gene.
`number_cells_expressing`	The number of bead barcodes that detect this gene.

Definitions

Bead Barcode: Short nucleotide sequence used to label and distinguish which reads come from each unique bead, allowing for tracking of many beads simultaneously.

Slide-seq Count Matrix Overview

Table 1. Global attributes​

Table 2. Column attributes (bead barcode metrics)​

Table 3. Row attributes (gene metrics)​

Definitions​

Table 1. Global attributes

Table 2. Column attributes (bead barcode metrics)

Table 3. Row attributes (gene metrics)

Definitions