|Pipeline Version||Date Updated||Documentation Author||Questions or Feedback|
|Multiome v3.0.0||December, 2023||Kaylee Mathews||Please file GitHub issues in warp or contact the WARP Pipeline Development team|
Introduction to the Multiome workflow
Multiome is an open-source, cloud-optimized pipeline developed in collaboration with members of the BRAIN Initiative (BICCN and BICAN Sequencing Working Group) and SCORCH (see Acknowledgements below). It supports the processing of 10x 3' single-cell and single-nucleus gene expression (GEX) and chromatin accessibility (ATAC) data generated with the 10x Genomics Multiome assay.
The GEX component corrects cell barcodes (CBs) and Unique Molecular Identifiers (UMIs), aligns reads to the genome, calculates per-barcode and per-gene quality metrics, and produces a raw cell-by-gene count matrix.
The ATAC component corrects CBs, aligns reads to the genome, calculates per-barcode quality metrics, and produces a fragment file.
The wrapper WDL is available in the WARP repository (see the code here).
The following table provides a quick glance at the Multiome pipeline features:
|Assay type||10x single cell or single nucleus gene expression (GEX) and ATAC||10x Genomics|
|Overall workflow||Barcode correction, read alignment, gene and fragment quanitification|
|Workflow language||WDL 1.0||openWDL|
|Genomic Reference Sequence||GRCh38 human genome primary sequence||GENCODE human reference files|
|Gene annotation reference (GTF)||Reference containing gene annotations||GENCODE human GTF|
|Aligners||STARsolo (GEX), BWA-mem2 (ATAC)||Kaminow et al. 2021, Vasimuddin et al. 2019|
|Transcript and fragment quantification||STARsolo (GEX), SnapATAC2 (ATAC)||Kaminow et al. 2021, SnapATAC2|
|Data input file format||File format in which sequencing data is provided||FASTQ|
|Data output file format||File formats in which Multiome output is provided||BAM and h5ad|
To discover and search releases, use the WARP command-line tool Wreleaser.
If you’re running a Multiome workflow version prior to the latest release, the accompanying documentation for that release may be downloaded with the source code on the WARP releases page (see the source code folder).
Multiome can be deployed using Cromwell, a GA4GH compliant, flexible workflow management system that supports multiple computing platforms. The workflow can also be run in Terra, a cloud-based analysis platform. The Multiome public workspace on Terra contains the Multiome workflow, workflow configuration, required reference data and other inputs, and example testing data.
|input_id||Unique identifier describing the biological sample or replicate that corresponds with the FASTQ files; can be a human-readable name or UUID.||String|
|annotations_gtf||GTF file containing gene annotations used for GEX cell metric calculation and ATAC fragment metrics; must match the GTF used to build the STAR aligner.||File|
|gex_r1_fastq||Array of read 1 FASTQ files representing a single GEX 10x library.||Array[File]|
|gex_r2_fastq||Array of read 2 FASTQ files representing a single GEX 10x library.||Array[File]|
|gex_i1_fastq||Optional array of index FASTQ files representing a single GEX 10x library; multiplexed samples are not currently supported, but the file may be passed to the pipeline.||Array[File]|
|tar_star_reference||TAR file containing a species-specific reference genome and GTF for Optimus (GEX) pipeline.||File|
|ref_genome_fasta||Genome FASTA file used for building the indices.||File|
|mt_genes||Optional file for the Optimus (GEX) pipeline containing mitochondrial gene names used for metric calculation; default assumes 'mt' prefix in GTF (case insensitive).||File|
|counting_mode||Optional string that determines whether the Optimus (GEX) pipeline should be run in single-cell mode (sc_rna) or single-nucleus mode (sn_rna); default is "sn_rna".||String|
|tenx_chemistry_version||Optional integer for the Optimus (GEX) pipeline specifying the 10x version chemistry the data was generated with; validated by examination of the first read 1 FASTQ file read structure; default is "3".||Integer|
|emptydrops_lower||Optional threshold for UMIs for the Optimus (GEX) pipeline that empty drops tool should consider for determining cell; data below threshold is not removed; default is "100".||Integer|
|force_no_check||Optional boolean for the Optimus (GEX) pipeline indicating if the pipeline should perform checks; default is "false".||Boolean|
|ignore_r1_read_length||Optional boolean for the Optimus (GEX) pipeline indicating if the pipeline should ignore barcode chemistry check; if "true", the workflow will not ensure the ||Boolean|
|star_strand_mode||Optional string for the Optimus (GEX) pipeline for performing STARsolo alignment on forward stranded, reverse stranded, or unstranded data; default is "Forward".||String|
|count_exons||Optional boolean for the Optimus (GEX) pipeline indicating if the workflow should calculate exon counts when in single-nucleus (sn_rna) mode; if "true" in sc_rna mode, the workflow will return an error; default is "false".||Boolean|
|gex_whitelist||Optional file containing the list of valid barcodes for 10x multiome GEX data; default is "gs://gcp-public-data--broad-references/RNA/resources/arc-v1/737K-arc-v1_gex.txt".||File|
|atac_r1_fastq||Array of read 1 paired-end FASTQ files representing a single 10x multiome ATAC library.||Array[File]|
|atac_r2_fastq||Array of barcodes FASTQ files representing a single 10x multiome ATAC library.||Array[File]|
|atac_r3_fastq||Array of read 2 paired-end FASTQ files representing a single 10x multiome ATAC library.||Array[File]|
|tar_bwa_reference||TAR file containing the reference index files for BWA-mem alignment for the ATAC pipeline.||File|
|chrom_sizes||File containing the genome chromosome sizes; used to calculate ATAC fragment file metrics.||File|
|adapter_seq_read1||Optional string describing the adapter sequence for ATAC read 1 paired-end reads to be used during adapter trimming with Cutadapt; default is "GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG".||String|
|adapter_seq_read3||Optional string describing the adapter sequence for ATAC read 2 paired-end reads to be used during adapter trimming with Cutadapt; default is "TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG".||String|
|atac_whitelist||Optional file containing the list of valid barcodes for 10x multiome ATAC adata; default is "gs://gcp-public-data--broad-references/RNA/resources/arc-v1/737K-arc-v1_atac.txt".||File|
|run_cellbender||Optional boolean used to determine if the Optimus (GEX) pipeline should run CellBender on the output gene expression h5ad file, ||Boolean|
Sample inputs for analyses in a Terra Workspace
The Multiome pipeline is currently available on the cloud-based platform Terra. After registering, you can access the Multiome public workspace which is preloaded with instructions and sample data. Please view the Support Center for more information on using the Terra platform.
The Multiome workflow calls two WARP subworkflows, one external subworkflow (optional), and an additional task, which are described briefly in the table below. For more details on each subworkflow and task, see the documentation and WDL scripts linked in the table.
|ATAC (WDL and documentation)||fastqprocess, bwa-mem, SnapATAC2||Workflow used to analyze 10x single-cell ATAC data.|
|Optimus (WDL and documentation)||fastqprocess, STARsolo, Emptydrops||Workflow used to analyze 10x single-cell GEX data.|
|JoinMultiomeBarcodes as JoinBarcodes (WDL)||Python3||Task that adds an extra column to the Optimus metrics |
|CellBender.run_cellbender_remove_background_gpu as CellBender (WDL)||CellBender||Optional task that runs the |
|Output variable name||Filename, if applicable||Output format and description|
|multiome_pipeline_version_out||N.A.||String describing the version of the Multiome pipeline used.|
|bam_aligned_output_atac||BAM file containing aligned reads from ATAC workflow.|
|fragment_file_atac||Sorted and bgzipped TSV file containing fragment start and stop coordinates per barcode. The columns are "Chromosome", "Start", "Stop", "ATAC Barcode", "Number of reads", and "GEX Barcode".|
|fragment_file_index||tabix index file for the fragment file.|
|snap_metrics_atac||h5ad (Anndata) file containing per-barcode metrics from SnapATAC2. Also contains the equivalent gene expression barcode for each ATAC barcode in the |
|genomic_reference_version_gex||File containing the Genome build, source and GTF annotation version.|
|bam_gex||BAM file containing aligned reads from Optimus workflow.|
|matrix_gex||NPZ file containing raw gene by cell counts.|
|matrix_row_index_gex||NPY file containing the row indices.|
|matrix_col_index_gex||NPY file containing the column indices.|
|cell_metrics_gex||CSV file containing the per-cell (barcode) metrics.|
|gene_metrics_gex||CSV file containing the per-gene metrics.|
|cell_calls_gex||TSV file containing the EmptyDrops results when the Optimus workflow is run in sc_rna mode.|
|h5ad_output_file_gex||h5ad (Anndata) file containing the raw cell-by-gene count matrix, gene metrics, cell metrics, and global attributes. Also contains equivalent ATAC barcode for each gene expression barcode in the |
|cell_barcodes_csv||Optional output produced when |
|checkpoint_file||Optional output produced when |
|h5_array||Optional output produced when |
|html_report_array||Optional output produced when |
|log||Optional output produced when |
|metrics_csv_array||Optional output produced when |
|output_directory||Optional output produced when |
|summary_pdf||Optional output produced when |
Versioning and testing
Citing the Multiome Pipeline
Please identify the pipeline in your methods section using the Multiome Pipeline's SciCrunch resource identifier.
- Ex: Multiome Pipeline (RRID:SCR_024217)
This pipeline is supported by the BRAIN Initiative (BICCN and BICAN).
If your organization also uses this pipeline, we would like to list you! Please reach out to us by contacting the WARP Pipeline Development team.
We are immensely grateful to the members of the BRAIN Initiative (BICAN Sequencing Working Group) and SCORCH for their invaluable and exceptional contributions to this pipeline. Our heartfelt appreciation goes to Alex Dobin, Aparna Bhaduri, Alec Wysoker, Anish Chakka, Brian Herb, Daofeng Li, Fenna Krienen, Guo-Long Zuo, Jeff Goldy, Kai Zhang, Khalid Shakir, Bo Li, Mariano Gabitto, Michael DeBerardine, Mengyi Song, Melissa Goldman, Nelson Johansen, James Nemesh, and Theresa Hodges for their unwavering dedication and remarkable efforts.
Please help us make our tools better by contacting the WARP Pipeline Development team for pipeline-related suggestions or questions.