Smart-seq2 Multi-Sample Overview
9/12/2014
We are deprecating the Smart-seq2 Multi-Sample Pipeline. Although the code will continue to be available, we are no longer supporting it. For an alternative, see the Smart-seq2 Single Nucleus Multi Sample workflow.
Pipeline Version | Date Updated | Documentation Author | Questions or Feedback |
---|---|---|---|
MultiSampleSmartSeq2_v2.2.21 | December, 2023 | Elizabeth Kiernan | Please file an issue in WARP. |
Introduction
The Smart-seq2 Multi-Sample (Multi-SS2) Pipeline is a wrapper around the Smart-seq2 Single Sample pipeline. It is developed by the Data Coordination Platform of the Human Cell Atlas to process single-cell RNAseq (scRNAseq) data generated by Smart-seq2 assays. The workflow processes multiple cells by importing and running the Smart-seq2 Single Sample workflow for each cell (sample) and then merging the resulting Loom matrix output into a single Loom matrix containing raw counts and TPMs.
Full details about the Smart-seq2 Pipeline can be read in the Smart-seq2 Single Sample Overview in GitHub.
The Multi-SS2 workflow can also be run in Terra, a cloud-based analysis platform. The Terra Smart-seq2 public workspace contains the Smart-seq2 workflow, workflow configurations, required reference data and other inputs, and example testing data.
Check out the Smart-seq2 Publication Methods to get started!
Inputs
There are two example configuration (JSON) files available for testing the Multi-SS2 workflow. Both examples are also preloaded in the Terra Smart-seq2 public workspace.
- human_single_example.json: Configurations for an example single-end human dataset consisting of two samples (cells)
- mouse_paired_example.json: Configurations for an example paired-end mouse dataset consisting of two samples (cells)
Sample and Reference Inputs
The workflow’s sample inputs are listed in the table below. Reference inputs are identical to those specified in the Smart-seq2 Single Sample Overview.
The workflow processes both single- and paired-end samples; however, these samples can not be mixed in the same run.
Input name | Input Description | Input Type |
---|---|---|
fastq1_input_files | Cloud locations for each read1 file | Array of strings |
fastq2_input_files | Optional cloud locations for each read2 file if running paired-end samples | Array of strings |
input_ids | Unique identifiers or names for each cell; can be a UUID or human-readable name | Array of strings |
input_names | Optional unique identifiers/names to further describe each cell. If input_id is a UUID, the input_name could be used as a human-readable identifier | String |
batch_id | Identifier for the batch of multiple samples | String |
batch_name | Optional string to describe the batch or biological sample | String |
input_name_metadata_field | Optional input describing, when applicable, the metadata field containing the input_names | String |
input_id_metadata_field | Optional string describing, when applicable, the metadata field containing the input_ids | String |
project_id | Optional project identifier; usually a number | String |
project_name | Optional project identifier; usually a human-readable name | String |
library | Optional description of the sequencing method or approach | String |
organ | Optional description of the organ from which the cells were derived | String |
species | Optional description of the species from which the cells were derived | String |
paired-end | Boolean for whether samples are paired-end or not | Boolean |
Additional Input
The reference inputs are identical to those specified in the "Additional Reference Inputs" section of the Smart-seq2 Single Sample Overview.
Smart-seq2 Multi-Sample Task Summary
The Multi-SS2 Pipeline calls two tasks:
1) SmartSeq2SingleSample: a task that runs the Smart-seq2 Single Sample workflow 2) SmartSeq2PlateAggregation: the wrapper pipeline that aggregates the results
Outputs
Output file name | Output Description | Output Type |
---|---|---|
bam_files | An array of genome-aligned BAM files (one for each sample) generated with HISAT2 | Array |
bam_index_files | An array of BAM index files generated with HISAT2 | Array |
loom_output | A single Loom cell-by-gene matrix containing raw counts and TPMs for every cell | File |
The final Loom matrix is an aggregate of all the individual Loom matrices generated using the Smart-seq2 Single Sample workflow.
The aggregated Loom filename contains the batch_id
prefix, which is the string specified in the input configuration. The batch_id
is also set as a global attribute in the Loom.
Both the individual sample Loom files and individual BAM files are described in the Smart-seq2 Single Sample Overview.
Please note that we have deprecated the previously used Zarr array output. The pipeline now uses the Loom file format as the default output.
Validation
The Multi-SS2 Pipeline has been validated for processing human and mouse, stranded or unstranded, paired- or single-end, and plate- or fluidigm-based Smart-seq2 datasets (see links to validation reports in the table below).
Workflow Configuration | Link to Report |
---|---|
Mouse paired-end | Report |
Human and mouse single-end | Report |
Human stranded fluidigm | Report |
Versioning
Release information for the Multi-SS2 Pipeline can be found in the changelog. Please note that any major changes to the Smart-seq2 pipeline will be documented in the Smart-seq2 Single Sample changelog.
Citing the Smart-seq2 Multi-Sample Pipeline
If you use the Smart-seq2 Multi-Sample Pipeline in your research, please identify the pipeline in your methods section using the Smart-seq2 Multi-Sample SciCrunch resource identifier.
- Ex: Smart-seq2 Multi-Sample Pipeline (RRID:SCR_018920)
Please also consider citing our preprint:
Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1
Consortia Support
This pipeline is supported and used by the Human Cell Atlas (HCA) project.
If your organization also uses this pipeline, we would love to list you! Please reach out to us by filing an issue in WARP.
Have Suggestions?
Please help us make our tools better by filing an issue in WARP; we welcome pipeline-related suggestions or questions.