Skip to main content

Slide-tags Overview

Pipeline VersionDate UpdatedDocumentation AuthorQuestions or Feedback
v1.0.0May, 2025WARP PipelinesPlease file an issue in WARP

Slide-tags_diagram

Introduction to the Slide-tags Pipeline

The Slide-tags Pipeline is an open-source, cloud-optimized workflow for processing spatial transcriptomics data. It supports data derived from spatially barcoded sequencing technologies, including Slide-tags-based single-molecule profiling. The pipeline processes raw sequencing data into spatially resolved gene expression matrices, ensuring accurate alignment, spatial positioning, and quantification.

This pipeline integrates multiple processing steps, including barcode extraction, spatial alignment, transcript counting, and output generation in formats compatible with community tools. The pipeline calls three subworkflows: the Optimus workflow for gene expression data, the SpatialCount workflow for spatial barcode processing, and the Positioning workflow for integrating both data types.

The Optimus workflow (GEX) corrects cell barcodes and Unique Molecular Identifiers (UMIs), aligns reads to the genome, calculates per-barcode and per-gene quality metrics, and produces a cell-by-gene count matrix.

The SpatialCount workflow processes spatial barcoding data from FASTQ files to generate spatial barcode counts that represent transcript locations within the tissue.

The Positioning workflow integrates the gene expression data with spatial information to generate coordinate files and visualizations, producing a Seurat object for downstream spatial analysis.

Quickstart Table

Pipeline FeaturesDescriptionSource
Assay typeSpatial transcriptomics using Slide-tagsMacosko Lab
Overall workflowBarcode extraction, spatial positioning, transcript quantificationOriginal code available from GitHub; WDL workflow available in WARP.
Workflow languageWDLopenWDL
Sub-workflowsOptimus (gene expression), SpatialCount (spatial barcoding), Positioning (integration)Imported from separate WDL scripts
Genomic Reference SequenceSTAR reference genome provided as tar fileReferenced as input parameter
Gene annotation referenceGTF file containing gene annotationsReferenced as input parameter
AlignersSTARsoloSTAR aligner
Data input file formatFile format in which sequencing data is providedFASTQ and CSV
Data output file formatOutput formats for downstream analysisHDF5, Seurat and CSV

Set-up

Installation

To download the latest Slide-tags release, see the release tags prefixed with "Slide-tags" on the WARP releases page. All Slide-tags pipeline releases are documented in the Slide-tags changelog.

The pipeline can be deployed using Cromwell, a GA4GH-compliant workflow manager. Additionally, it can be run in cloud-based analysis platforms such as Terra.

Inputs

The pipeline requires JSON-formatted configuration files detailing input parameters. Required inputs include:

  • Raw paired-end GEX FASTQ files containing sequencing GEX reads
  • Raw paired-end Spatial FASTQ files containing spatial reads
  • Pucks files contains spatial coordinates of bead centroids

General input variables used by both the Optimus and spatial/positioning components of the pipeline.

Input VariablesDescriptionFormat
input_idUnique input identifierString
dockerDocker image used for the workflowString

Input variables for the spatial and positioning components of the Slide-Tags pipeline can be found below.

Input VariablesDescriptionFormat
spatial_fastqArray of paths to spatial FASTQ files. Requires at least one complete R1 and R2 pair. Each filename must include R1 or R2 to distinguish read pairs. The full directory is scanned, matching R1 and R2 files; an error is raised if any pair is incompleteArray[String]
pucksArray of paths to puck filesArray[String]

Optimus input variables can be found below.

Input VariablesDescriptionFormat
gex_r1_fastqArray of FASTQ files for R1 readsArray[File]
gex_r2_fastqArray of FASTQ files for R2 readsArray[File]
gex_i1_fastqOptional FASTQ files for I1 index readsArray[File]?
tar_star_referenceReference genome in a TAR format for STAR alignFile
annotations_gtfGene annotation file in GTF formatFile
gex_whitelistWhitelist file for cell barcodesFile
cloud_providerCloud provider for computing resourcesString
expected_cellsExpected number of cells in the datasetInt
counting_modeCounting mode (e.g., snRNA)String
tenx_chemistry_versionVersion of 10X chemistry usedInt
emptydrops_lowerLower threshold for EmptyDrops filteringInt
force_no_checkFlag to disable sanity checksBoolean
ignore_r1_read_lengthIgnore length check for R1 readsBoolean
star_strand_modeStrand mode setting for STAR alignmentString
count_exonsFlag to enable exon countingBoolean
soloMultiMappersOptional setting for handling multi-mapped readsString?
gex_nhash_idOptional NHash identifier for gene expressionString?
mt_genesOptional file listing mitochondrial genesFile?

Example input configurations can be found in the test_inputs folder of the GitHub repository.

Slide-tags Pipeline Tasks and Tools

The workflow is composed of several key steps, implemented in separate WDL tasks:

TaskToolDescription
OptimusSTARsoloGene quantification subworkflow that aligns reads to a reference genome and produces a count matrix. Read more in the Optimus Overview.
spatial_countCustom Julia script developed by the Macosko labExtracts spatial barcodes, performs barcode sequencing error correction, maps reads to spatial barcodes and stores unique (cell, UMI, barcode) triplets in a count matrix, and calculates quality control metrics. Produces an h5 output.
positioningCustom R scripts for developed by the Macosko lab; includes positioning.R, helpers.R, and run-positioning.RTakes in the rna_paths (path to the filtered cell by gene count matrix, UMI counts, and the intronic metrics) to extract cell barcodes, calculates log-transformed UMI counts, and determines mitochondrial gene percentages. Performs data normalization, PCA, clustering, and UMAP embedding for visualization and produces quality metrics and graphs. Assigns cell barcodes to spatial barcode coordinates.

Each of these tasks utilizes scripts from the Macosko Lab Pipelines repository, modified for streamlined output handling. Dockers for running these scripts are maintained in the warp-tools repository under slide-tags.

Outputs

Optimus outputs

Output VariableFile NameDescriptionFormat
optimus_genomic_reference_version<reference_version>.txtFile containing the Genome build, source and GTF annotation version.TXT
optimus_bam<input_id>.bamBAM file containing aligned reads from Optimus workflow.BAM
optimus_matrix<input_id>_gex_sparse_counts.npzNPZ file containing raw gene by cell counts.NPZ
optimus_matrix_row_index<input_id>_gex_sparse_counts_row_index.npyNPY file containing the row indices.NPY
optimus_matrix_col_index<input_id>_gex_sparse_counts_col_index.npyNPY file containing the column indices.NPY
optimus_cell_metrics<input_id>_gex.cell_metrics.csv.gzCSV file containing the per-cell (barcode) metrics.Compressed CSV
optimus_gene_metrics<input_id>.gene_metrics.csv.gzCSV file containing the per-gene metrics.Compressed CSV
optimus_cell_calls<input_id>.emptyDrops.csvTSV file containing the EmptyDrops results when the Optimus workflow is run in sc_rna mode.CSV
optimus_h5ad_output_file<input_id>.h5adh5ad (Anndata) file containing the raw cell-by-gene count matrix, gene metrics, cell metrics, and global attributes. See the Optimus Count Matrix Overview for more details.H5AD
optimus_multimappers_EM_matrixUniqueAndMult-EM.mtxOptional output produced when soloMultiMappers is "EM"; see STARsolo documentation for more information.MTX
optimus_multimappers_Uniform_matrixUniqueAndMult-Uniform.mtxOptional output produced when soloMultiMappers is "Uniform"; see STARsolo documentation for more information.MTX
optimus_multimappers_Rescue_matrixUniqueAndMult-Rescue.mtxOptional output produced when soloMultiMappers is "Rescue"; see STARsolo documentation for more information.MTX
optimus_multimappers_PropUnique_matrixUniqueAndMult-PropUnique.mtxOptional output produced when soloMultiMappers is "PropUnique"; see STARsolo documentation for more information.
optimus_aligner_metrics<input_id>.star_metrics.tarText file containing per barcode metrics (CellReads.stats) produced by the GEX pipeline STARsolo aligner.TAR
optimus_library_metrics<input_id>_gex_<gex_nhash_id>_library_metrics.csvOptional CSV file containing all library-level metrics calculated with STARsolo for gene expression data.CSV
optimus_mtx_files<input_id>_gex.mtx_files.tarTAR file with STARsolo matrix market files (barcodes.tsv, features.tsv, and matrix.mtx)TAR
cb_cell_barcodes_csv<cell_csv>Optional output produced when run_cellbender is "true"; see CellBender documentation and GitHub repository for more information.
cb_checkpoint_file<ckpt_file>Optional output produced when run_cellbender is "true"; see CellBender documentation and GitHub repository for more information.
cb_h5_array<h5_array>Optional output produced when run_cellbender is "true"; see CellBender documentation and GitHub repository for more information.
cb_html_report_array<report_array>Optional output produced when run_cellbender is "true"; see CellBender documentation and GitHub repository for more information.
cb_log<log>Optional output produced when run_cellbender is "true"; see CellBender documentation and GitHub repository for more information.
cb_metrics_csv_array<metrics_array>Optional output produced when run_cellbender is "true"; see CellBender documentation and GitHub repository for more information.
cb_output_directory<output_dir>Optional output produced when run_cellbender is "true"; see CellBender documentation and GitHub repository for more information.
cb_summary_pdf<pdf>Optional output produced when run_cellbender is "true"; see CellBender documentation and GitHub repository for more information.

Output variables for the spatial and positioning components of the Slide-Tags pipeline

For more details regarding the output variables, please refer to the README in the Slide-Tags directoty in the Macosko Lab repository.

Output VariableFile NameDescriptionFormat
spatial_output_h5<input_id>_SBcounts.h5h5 file containing cell by bead matrix and spatial barcode information.H5
spatial_output_log<input_id>_spatial-count.logStandard output of the spatial barcodes task.TXT
positioning_seurat_qs<input_id>_seurat.qsSeurat object with processed spatial transcriptomics data.SEURAT
positioning_coords_csv<input_id>_coords.csvSpatial coordinates for called cells.CSV
positioning_coords2_csv<input_id>_coords2.csvAlternate or refined spatial coordinates.CSV
positioning_summary_pdf<input_id>_summary.pdfQC summary report with plots and metrics.PDF
positioning_intermediates<input_id>_intermediates.tar.gzContains spatial barcodes matrix, cell barcode whitelist, and spatial metadata.TAR
positioning_log<input_id>_positioning.logStandard output of the positioning task.TXT

Versioning

All releases of the pipeline are documented in the repository’s changelog.

Citing the Slide-tags Pipeline

If you use the Slide-tags Pipeline in your research, please cite the original sources:

Please also consider citing our preprint:

Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1

Acknowledgements

We are immensely grateful Matthew Shabet and the Macosko Lab for development of these analsyes, for their generous time making these scripts FAIR, and for the many hours working with the WARP team to incoporate the scripts into WDL.

Feedback

Please help us make our tools better by filing an issue in WARP; we welcome pipeline-related suggestions or questions.