Skip to main content

ATAC Count Matrix Overview

The ATAC pipeline's default count matrix output is an h5ad file generated using SnapATAC2 and AnnData.

The h5ad file contains unstructured metadata (h5ad.uns; Table 1) as well as per-barcode quality metrics (h5ad.obs; Table 2). It also contains an equivalent gene expression barcode for each ATAC barcode. Raw fragments are stored in the h5ad.obsm['insertion'] property of the h5ad file. For more information, see the import_data function in the SnapATAC2 documentation.

The h5ad file does not contain per-gene metrics, meaning the variables/features data frame (h5ad.var) is empty.

Table 1. Global attributes

The global attributes (unstuctured metadata) in the h5ad apply to the whole file, not any specific part.

AttributeProgramDetails
reference_sequencesSnapATAC2Data frame containing the chromosome sizes for the genome build (i.e., hg38); created using the chrom_sizes pipeline input.

Table 2. Cell metrics

Cell MetricsProgramDetails
tsseSnapATAC2Transcription start site enrichment (TSSe) score; lower scores suggest poor data quality. Learn more about TSSe in the Definitions section below.
n_fragmentSnapATAC2Number of unique fragments corresponding to the ATAC cell barcode. Fragments are stored in the h5ad.obsm property of the output h5ad file. Learn more about cell barcodes and fragments in the Definitions section below.
frac_dupSnapATAC2Fraction of reads associated with the cell barcode that are duplicates.
frac_mitoSnapATAC2Fraction of reads associated with the cell barcode that are mitochondrial.
gex_barcodesAnnDataGene expression barcode associated with each ATAC cell barcode. This column is only produced when ATAC is run as part of the Multiome pipeline.

Definitions

  • Cell Barcode: Short nucleotide sequence used to label and distinguish which reads come from each unique cell, allowing for tracking of many cells simultaneously.
  • Fragment: A distinct segment of a read that aligns to a specific location on the reference genome.
  • Transcription Start Site Enrichment (TSSe): A common quality control metric in ATAC-seq data, indicating increased accessibility around the transcription start sites of genes. High TSSe suggests successful capture of relevant genomic features, while low TSSe may signal data quality or processing issues.