Slide-tags Overview

Pipeline Version	Date Updated	Documentation Author	Questions or Feedback
v1.0.0	May, 2025	WARP Pipelines	Please file an issue in WARP

Slide-tags_diagram

Introduction to the Slide-tags Pipeline

The Slide-tags Pipeline is an open-source, cloud-optimized workflow for processing spatial transcriptomics data. It supports data derived from spatially barcoded sequencing technologies, including Slide-tags-based single-molecule profiling. The pipeline processes raw sequencing data into spatially resolved gene expression matrices, ensuring accurate alignment, spatial positioning, and quantification.

This pipeline integrates multiple processing steps, including barcode extraction, spatial alignment, transcript counting, and output generation in formats compatible with community tools. The pipeline calls three subworkflows: the Optimus workflow for gene expression data, the SpatialCount workflow for spatial barcode processing, and the Positioning workflow for integrating both data types.

The Optimus workflow (GEX) corrects cell barcodes and Unique Molecular Identifiers (UMIs), aligns reads to the genome, calculates per-barcode and per-gene quality metrics, and produces a cell-by-gene count matrix.

The SpatialCount workflow processes spatial barcoding data from FASTQ files to generate spatial barcode counts that represent transcript locations within the tissue.

The Positioning workflow integrates the gene expression data with spatial information to generate coordinate files and visualizations, producing a Seurat object for downstream spatial analysis.

Quickstart Table

Pipeline Features	Description	Source
Assay type	Spatial transcriptomics using Slide-tags	Macosko Lab
Overall workflow	Barcode extraction, spatial positioning, transcript quantification	Original code available from GitHub; WDL workflow available in WARP.
Workflow language	WDL	openWDL
Sub-workflows	Optimus (gene expression), SpatialCount (spatial barcoding), Positioning (integration)	Imported from separate WDL scripts
Genomic Reference Sequence	STAR reference genome provided as tar file	Referenced as input parameter
Gene annotation reference	GTF file containing gene annotations	Referenced as input parameter
Aligners	STARsolo	STAR aligner
Data input file format	File format in which sequencing data is provided	FASTQ and CSV
Data output file format	Output formats for downstream analysis	HDF5, Seurat and CSV

Set-up

Installation

To download the latest Slide-tags release, see the release tags prefixed with "Slide-tags" on the WARP releases page. All Slide-tags pipeline releases are documented in the Slide-tags changelog.

The pipeline can be deployed using Cromwell, a GA4GH-compliant workflow manager. Additionally, it can be run in cloud-based analysis platforms such as Terra.

Inputs

The pipeline requires JSON-formatted configuration files detailing input parameters. Required inputs include:

Raw paired-end GEX FASTQ files containing sequencing GEX reads
Raw paired-end Spatial FASTQ files containing spatial reads
Pucks files contains spatial coordinates of bead centroids

General input variables used by both the Optimus and spatial/positioning components of the pipeline.

Input Variables	Description	Format
input_id	Unique input identifier	String
docker	Docker image used for the workflow	String

Input variables for the spatial and positioning components of the Slide-Tags pipeline can be found below.

Input Variables	Description	Format
spatial_fastq	Array of paths to spatial FASTQ files. Requires at least one complete R1 and R2 pair. Each filename must include R1 or R2 to distinguish read pairs. The full directory is scanned, matching R1 and R2 files; an error is raised if any pair is incomplete	Array[String]
pucks	Array of paths to puck files	Array[String]

Optimus input variables can be found below.

Input Variables	Description	Format
gex_r1_fastq	Array of FASTQ files for R1 reads	Array[File]
gex_r2_fastq	Array of FASTQ files for R2 reads	Array[File]
gex_i1_fastq	Optional FASTQ files for I1 index reads	Array[File]?
tar_star_reference	Reference genome in a TAR format for STAR align	File
annotations_gtf	Gene annotation file in GTF format	File
gex_whitelist	Whitelist file for cell barcodes	File
cloud_provider	Cloud provider for computing resources	String
expected_cells	Expected number of cells in the dataset	Int
counting_mode	Counting mode (e.g., snRNA)	String
tenx_chemistry_version	Version of 10X chemistry used	Int
emptydrops_lower	Lower threshold for EmptyDrops filtering	Int
force_no_check	Flag to disable sanity checks	Boolean
ignore_r1_read_length	Ignore length check for R1 reads	Boolean
star_strand_mode	Strand mode setting for STAR alignment	String
count_exons	Flag to enable exon counting	Boolean
soloMultiMappers	Optional setting for handling multi-mapped reads	String?
gex_nhash_id	Optional NHash identifier for gene expression	String?
mt_genes	Optional file listing mitochondrial genes	File?

Example input configurations can be found in the test_inputs folder of the GitHub repository.

Slide-tags Pipeline Tasks and Tools

The workflow is composed of several key steps, implemented in separate WDL tasks:

Task	Tool	Description
Optimus	STARsolo	Gene quantification subworkflow that aligns reads to a reference genome and produces a count matrix. Read more in the Optimus Overview.
spatial_count	Custom Julia script developed by the Macosko lab	Extracts spatial barcodes, performs barcode sequencing error correction, maps reads to spatial barcodes and stores unique (cell, UMI, barcode) triplets in a count matrix, and calculates quality control metrics. Produces an h5 output.
positioning	Custom R scripts for developed by the Macosko lab; includes positioning.R, helpers.R, and run-positioning.R	Takes in the rna_paths (path to the filtered cell by gene count matrix, UMI counts, and the intronic metrics) to extract cell barcodes, calculates log-transformed UMI counts, and determines mitochondrial gene percentages. Performs data normalization, PCA, clustering, and UMAP embedding for visualization and produces quality metrics and graphs. Assigns cell barcodes to spatial barcode coordinates.

Each of these tasks utilizes scripts from the Macosko Lab Pipelines repository, modified for streamlined output handling. Dockers for running these scripts are maintained in the warp-tools repository under slide-tags.

Outputs

Optimus outputs

Output Variable	File Name	Description	Format
optimus_genomic_reference_version	`<reference_version>.txt`	File containing the Genome build, source and GTF annotation version.	TXT
optimus_bam	`<input_id>.bam`	BAM file containing aligned reads from Optimus workflow.	BAM
optimus_matrix	`<input_id>_gex_sparse_counts.npz`	NPZ file containing raw gene by cell counts.	NPZ
optimus_matrix_row_index	`<input_id>_gex_sparse_counts_row_index.npy`	NPY file containing the row indices.	NPY
optimus_matrix_col_index	`<input_id>_gex_sparse_counts_col_index.npy`	NPY file containing the column indices.	NPY
optimus_cell_metrics	`<input_id>_gex.cell_metrics.csv.gz`	CSV file containing the per-cell (barcode) metrics.	Compressed CSV
optimus_gene_metrics	`<input_id>.gene_metrics.csv.gz`	CSV file containing the per-gene metrics.	Compressed CSV
optimus_cell_calls	`<input_id>.emptyDrops.csv`	TSV file containing the EmptyDrops results when the Optimus workflow is run in sc_rna mode.	CSV
optimus_h5ad_output_file	`<input_id>.h5ad`	h5ad (Anndata) file containing the raw cell-by-gene count matrix, gene metrics, cell metrics, and global attributes. See the Optimus Count Matrix Overview for more details.	H5AD
optimus_multimappers_EM_matrix	`UniqueAndMult-EM.mtx`	Optional output produced when `soloMultiMappers` is "EM"; see STARsolo documentation for more information.	MTX
optimus_multimappers_Uniform_matrix	`UniqueAndMult-Uniform.mtx`	Optional output produced when `soloMultiMappers` is "Uniform"; see STARsolo documentation for more information.	MTX
optimus_multimappers_Rescue_matrix	`UniqueAndMult-Rescue.mtx`	Optional output produced when `soloMultiMappers` is "Rescue"; see STARsolo documentation for more information.	MTX
optimus_multimappers_PropUnique_matrix	`UniqueAndMult-PropUnique.mtx`	Optional output produced when `soloMultiMappers` is "PropUnique"; see STARsolo documentation for more information.
optimus_aligner_metrics	`<input_id>.star_metrics.tar`	Text file containing per barcode metrics (`CellReads.stats`) produced by the GEX pipeline STARsolo aligner.	TAR
optimus_library_metrics	`<input_id>_gex_<gex_nhash_id>_library_metrics.csv`	Optional CSV file containing all library-level metrics calculated with STARsolo for gene expression data.	CSV
optimus_mtx_files	`<input_id>_gex.mtx_files.tar`	TAR file with STARsolo matrix market files (barcodes.tsv, features.tsv, and matrix.mtx)	TAR
cb_cell_barcodes_csv	`<cell_csv>`	Optional output produced when `run_cellbender` is "true"; see CellBender documentation and GitHub repository for more information.
cb_checkpoint_file	`<ckpt_file>`	Optional output produced when `run_cellbender` is "true"; see CellBender documentation and GitHub repository for more information.
cb_h5_array	`<h5_array>`	Optional output produced when `run_cellbender` is "true"; see CellBender documentation and GitHub repository for more information.
cb_html_report_array	`<report_array>`	Optional output produced when `run_cellbender` is "true"; see CellBender documentation and GitHub repository for more information.
cb_log	`<log>`	Optional output produced when `run_cellbender` is "true"; see CellBender documentation and GitHub repository for more information.
cb_metrics_csv_array	`<metrics_array>`	Optional output produced when `run_cellbender` is "true"; see CellBender documentation and GitHub repository for more information.
cb_output_directory	`<output_dir>`	Optional output produced when `run_cellbender` is "true"; see CellBender documentation and GitHub repository for more information.
cb_summary_pdf	`<pdf>`	Optional output produced when `run_cellbender` is "true"; see CellBender documentation and GitHub repository for more information.

Output variables for the spatial and positioning components of the Slide-Tags pipeline

For more details regarding the output variables, please refer to the README in the Slide-Tags directoty in the Macosko Lab repository.

Output Variable	File Name	Description	Format
spatial_output_h5	`<input_id>_SBcounts.h5`	h5 file containing cell by bead matrix and spatial barcode information.	H5
spatial_output_log	`<input_id>_spatial-count.log`	Standard output of the spatial barcodes task.	TXT
positioning_seurat_qs	`<input_id>_seurat.qs`	Seurat object with processed spatial transcriptomics data.	SEURAT
positioning_coords_csv	`<input_id>_coords.csv`	Spatial coordinates for called cells.	CSV
positioning_coords2_csv	`<input_id>_coords2.csv`	Alternate or refined spatial coordinates.	CSV
positioning_summary_pdf	`<input_id>_summary.pdf`	QC summary report with plots and metrics.	PDF
positioning_intermediates	`<input_id>_intermediates.tar.gz`	Contains spatial barcodes matrix, cell barcode whitelist, and spatial metadata.	TAR
positioning_log	`<input_id>_positioning.log`	Standard output of the positioning task.	TXT

Versioning

All releases of the pipeline are documented in the repository’s changelog.

Citing the Slide-tags Pipeline

If you use the Slide-tags Pipeline in your research, please cite the original sources:

Macosko Lab Pipelines: https://github.com/MacoskoLab/Macosko-Pipelines/tree/main/slide-tags

Please also consider citing our preprint:

Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1

Acknowledgements

We are immensely grateful Matthew Shabet and the Macosko Lab for development of these analsyes, for their generous time making these scripts FAIR, and for the many hours working with the WARP team to incoporate the scripts into WDL.

Feedback

Please help us make our tools better by filing an issue in WARP; we welcome pipeline-related suggestions or questions.

Slide-tags Overview

Introduction to the Slide-tags Pipeline​

Quickstart Table​

Set-up​

Installation​

Inputs​

Slide-tags Pipeline Tasks and Tools​

Outputs​

Optimus outputs​

Output variables for the spatial and positioning components of the Slide-Tags pipeline​

Versioning​

Citing the Slide-tags Pipeline​

Acknowledgements​

Feedback​