Smart-seq2 Multi-Sample Overview
|Pipeline Version||Date Updated||Documentation Author||Questions or Feedback|
|MultiSampleSmartSeq2_v2.2.1||May, 2021||Elizabeth Kiernan||Please file GitHub issues in WARP or contact the WARP team|
The Smart-seq2 Multi-Sample (Multi-SS2) Pipeline is a wrapper around the Smart-seq2 Single Sample pipeline. It is developed by the Data Coordination Platform of the Human Cell Atlas to process single-cell RNAseq (scRNAseq) data generated by Smart-seq2 assays. The workflow processes multiple cells by importing and running the Smart-seq2 Single Sample workflow for each cell (sample) and then merging the resulting Loom matrix output into a single Loom matrix containing raw counts and TPMs.
Full details about the Smart-seq2 Pipeline can be read in the Smart-seq2 Single Sample Overview in GitHub.
The Multi-SS2 workflow can also be run in Terra, a cloud-based analysis platform. The Terra Smart-seq2 public workspace contains the Smart-seq2 workflow, workflow configurations, required reference data and other inputs, and example testing data.
Check out the Smart-seq2 Publication Methods to get started!
There are two example configuration (JSON) files available for testing the Multi-SS2 workflow. Both examples are also preloaded in the Terra Smart-seq2 public workspace.
- human_single_example.json: Configurations for an example single-end human dataset consisting of two samples (cells)
- mouse_paired_example.json: Configurations for an example paired-end mouse dataset consisting of two samples (cells)
Sample and Reference Inputs
The workflow’s sample inputs are listed in the table below. Reference inputs are identical to those specified in the Smart-seq2 Single Sample Overview.
The workflow processes both single- and paired-end samples; however, these samples can not be mixed in the same run.
|Input name||Input Description||Input Type|
|fastq1_input_files||Cloud locations for each read1 file||Array of strings|
|fastq2_input_files||Optional cloud locations for each read2 file if running paired-end samples||Array of strings|
|input_ids||Unique identifiers or names for each cell; can be a UUID or human-readable name||Array of strings|
|input_names||Optional unique identifiers/names to further describe each cell. If ||String|
|batch_id||Identifier for the batch of multiple samples||String|
|batch_name||Optional string to describe the batch or biological sample||String|
|input_name_metadata_field||Optional input describing, when applicable, the metadata field containing the ||String|
|input_id_metadata_field||Optional string describing, when applicable, the metadata field containing the ||String|
|Optional project identifier; usually a number||String|
|Optional project identifier; usually a human-readable name||String|
|Optional description of the sequencing method or approach||String|
|Optional description of the organ from which the cells were derived||String|
|Optional description of the species from which the cells were derived||String|
|Boolean for whether samples are paired-end or not||Boolean|
The reference inputs are identical to those specified in the "Additional Reference Inputs" section of the Smart-seq2 Single Sample Overview.
Smart-seq2 Multi-Sample Task Summary
The Multi-SS2 Pipeline calls two tasks:
|Output file name||Output Description||Output Type|
|bam_files||An array of genome-aligned BAM files (one for each sample) generated with HISAT2||Array|
|bam_index_files||An array of BAM index files generated with HISAT2||Array|
|loom_output||A single Loom cell-by-gene matrix containing raw counts and TPMs for every cell||File|
The final Loom matrix is an aggregate of all the individual Loom matrices generated using the Smart-seq2 Single Sample workflow.
The aggregated Loom filename contains the
batch_id prefix, which is the string specified in the input configuration. The
batch_id is also set as a global attribute in the Loom.
Both the individual sample Loom files and individual BAM files are described in the Smart-seq2 Single Sample Overview.
Please note that we have deprecated the previously used Zarr array output. The pipeline now uses the Loom file format as the default output.
The Multi-SS2 Pipeline has been validated for processing human and mouse, stranded or unstranded, paired- or single-end, and plate- or fluidigm-based Smart-seq2 datasets (see links to validation reports in the table below).
|Workflow Configuration||Link to Report|
|Human and mouse single-end||Report|
|Human stranded fluidigm||Report|
Release information for the Multi-SS2 Pipeline can be found in the changelog. Please note that any major changes to the Smart-seq2 pipeline will be documented in the Smart-seq2 Single Sample changelog.
Citing the Smart-seq2 Multi-Sample Pipeline
Please identify the pipeline in your methods section using the Smart-seq2 Multi-Sample Pipeline's SciCrunch resource identifier.
- Ex: Smart-seq2 Multi-Sample Pipeline (RRID:SCR_018920)
This pipeline is supported and used by the Human Cell Atlas (HCA) project.
If your organization also uses this pipeline, we would love to list you! Please reach out to us by contacting the WARP team.
Please help us make our tools better by contacting the WARP team for pipeline-related suggestions or questions.