Skip to main content

Welcome to WARP

WARP (WDL Analysis Research Pipelines) repository is a collection of cloud-optimized pipelines for processing biological data from the Broad Institute Data Sciences Platform and collaborators.

The contents of this repository are open source and released under the BSD 3-Clause license.

WARP Overview

WARP pipelines provide robust, standardized data analysis for the Broad Institute Genomics Platform and large consortia like the Human Cell Atlas and the BRAIN Initiative. You can count on WARP for rigorously scientifically validated, high scale, reproducible and open source pipelines.

Our pipelines are written as “workflows” using the Workflow Description Language (WDL) and they process a broad spectrum of “omic” and array-related datasets (see the overview table below).

Pipeline CategoryData Types
Germline Variant DiscoveryGenomes, Exomes
Genotyping ArraysVariant discovery, Chip validation, Joint array analysis
Single-cell/nuclei TranscriptomicsDroplet based (10x Genomics), Smartseq2
Single-cell EpigenomicsSingle nuclei ATAC-seq, Single nuclei MethylC-seq
Joint GenotypingGenomes, Exomes
Somatic Alignment (beta)Exomes
tip

Try our pipelines in Terra, a platform for collaborative cloud analysis! Learn how in the Using WARP section.

All versioned and released pipelines are in one of the three pipelines subdirectories: broad (pipelines for the Broad Institute’s Genomics Platform), cemba (pipelines for the BRAIN Initiative) or skylab (pipelines for the BRAIN Initiative and Human Cell Atlas Project).

Each pipeline directory hosts a main workflow WDL that includes a pipeline version number and a corresponding changelog file.

Workflows may call additional WDLs, referred to as tasks, that are located in the tasks directory.

Pipelines that are in progress or have not yet been validated are in the beta-pipelines folder.

Dockers and custom tools maintained in warp-tools repository

Each WARP workflow uses Docker images that contain the necessary software for the workflow's commands. All Docker images, build scripts for Docker images, and custom tools are maintained in a separate repository, warp-tools.

Using WARP

There are three ways to use WARP pipelines:

1. Download the workflow and run on a WDL-compatible execution engine

WDL workflows run on multiple systems, including Cromwell, miniWDL, and dxWDL (see the openwdl documentation).

To run a pipeline’s latest release, first navigate to WARP releases page, search for your pipeline’s tag, and download the pipeline’s assets (the WDL workflow, the JSON, and the ZIP with accompanying dependencies; see Optimus example below).

optimus_release

You can also discover and search releases using the WARP command-line tool Wreleaser.

After downloading the pipeline’s assets, launch the workflow following your execution engine’s instructions.

2. Run the pipeline in Terra

Several WARP pipelines are available in public workspaces on the Terra cloud platform. These workspaces include both the WDL workflow and downsampled data so that you can test the pipeline at low-cost.

If you are new to Terra, you can get started by registering with your Google account and visiting Terra Support. After registration, search for WARP-related workspaces with the “warp-pipelines” tag.

Terra_warp

To test the pipeline, clone (make a copy of) the workspace following the instructions in this Terra Support guide.

3. Run or export the pipeline from Dockstore

Dockstore is a GA4GH compliant open platform for sharing Docker-based tools like WDL workflows. You can find WARP pipelines in Dockstore and run them on the Dockstore platform or export them to other platforms (including Terra).

To view all available pipelines, just search “warp” in the Dockstore search and browse the workflow list. See Dockstore documentation for details on launching the workflow.

Dockstore

WARP Versioning and Releasing

Pipelines in WARP are versioned semantically to support reproducibility in scientific analysis and provide clearer analysis provenance. Version numbers allow researchers to confirm their data has all been processed in a compatible way. Semantic versioning gives immediate insight into the compatibility of pipeline outputs. Read more about versioning and releasing in WARP.

To discover and search releases, use the WARP command-line tool Wreleaser.

Testing in WARP

Each pipeline in WARP has accompanying continuous integration tests that run on each pull request (PR). These tests help ensure that no unexpected changes are made to each pipeline and confirm that each affected pipeline is tested with any changes to shared code. To support rapid development iteration, only the pipelines affected by a PR are tested and PRs to the develop branch run “plumbing” tests using small or downsampled inputs. When the staging branch is promoted to master, the updated pipelines will be tested more rigorously on a larger selection of data that covers more scientific test cases. Read more about our testing process.

Feedback

WARP is always evolving! Please file an issue in WARP with suggestions, feedback, or questions. We are always excited to discuss cloud data processing, provenance and reproducibility in scientific analysis, new pipeline features, or potential collaborations. Don’t hesitate to reach out!

Our planned upcoming improvements include:

  1. A unified testing infrastructure that eases the overhead for contribution
  2. Full contribution guidance
  3. Continued additions of pipeline documentation
  4. Pre-written methods sections and DOIs to enable easy publication citations
  5. More pipelines: bulk RNAseq, SlideSeq, updates to joint genotyping

Citing WARP

When citing WARP, please use the following:

Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1

Acknowledgements

WARP is maintained by the Broad Institute Data Sciences Platform (DSP) in collaboration with partner organizations. The Lantern Pipelines team maintains the repository with invaluable scientific oversight and pipeline contributions from the DSP Methods group as well as the HCA and BRAIN Initiative Analysis Working Groups. We thank the DSP Customer Delivery team for their help with user-, documentation-, and Terra- support. WARP pipelines have been made in collaboration with or informed by scientists across many institutions, including: labs at the Broad Institute, the European Bioinformatics Institute, Chan Zuckerburg Initiative, NY Genome Center, University of California Santa Cruz, Berkeley, and San Diego, the Allen Institute, Johns Hopkins Medical Institute, and the Baylor College of Medicine.