Runtime environments
The GATK-SV pipeline consists of workflows implemented in the Workflow Description Language (WDL) and is built for use on the Google Cloud Platform (GCP).
Terra (recommended)
To facilitate easy-of-use, security, and collaboration, GATK-SV is available on the Terra platform. Users should clone pre-configured Terra workspaces as a starting point for running GATK-SV:
- Single-sample workspace
- Joint calling workspace (TODO)
These workspaces are actively maintained by the development team and will be updated with critical fixes and major releases.
Cromwell (advanced)
Advanced users and developers who wish to run GATK-SV on a dedicated Cromwell server using GCP as a backend should refer to the Advanced Guides section.
Alternative backends (not supported)
Use of other backends (e.g. AWS or on-prem HPC clusters) is not currently supported. However, contributions from the community to improve portability between backends will be considered on a case-by-case-basis. We ask contributors to please adhere to the following guidelines when submitting issues and pull requests:
- Code changes must be functionally equivalent on GCP backends, i.e. not result in changed output
- Increases to cost and runtime on GCP backends should be minimal
- Avoid adding new inputs and tasks to workflows. Simpler changes are more likely to be approved, e.g. small in-line changes to scripts or WDL task command sections
- Avoid introducing new code paths, e.g. conditional statements
- Additional backend-specific scripts, workflows, tests, and Dockerfiles will not be approved
- Changes to Dockerfiles may require extensive testing before approval
We still encourage members of the community to adapt GATK-SV for non-GCP backends and share code on forked repositories. Here are a some considerations:
-
Refer to Cromwell's documentation for configuration instructions.
-
The handling and ordering of
glob
commands may differ between platforms. -
Shell commands that are potentially destructive to input files (e.g.
rm
,mv
,tabix
) can cause unexpected behavior on shared filesystems. Enabling copy localization may help to more closely replicate the behavior on GCP. -
For clusters that do not support Docker, Singularity is an alternative. See Cromwell documentation on Singularity.
-
The GATK-SV pipeline takes advantage of the massive parallelization possible in the cloud. Local backends may not have the resources to execute all of the workflows. Workflows that use fewer resources or that are less parallelized may be more successful. For instance, some users have been able to run GatherSampleEvidence on a SLURM cluster.