Skip to main content

Runtime Environments

The GATK-SV pipeline consists of workflows and reference data that orchestrates the analysis flow of input data. Hence, a successful execution requires running the workflows on reference and input data.

Currently supported backends: GCP

GATK-SV has been tested only on the Google Cloud Platform (GCP); therefore, we are unable to provide specific guidance or support for other execution platforms including HPC clusters and AWS.

Alternative backends

Contributions from the community to improve portability between backends will be considered on a case-by-case-basis. We ask contributors to please adhere to the following guidelines when submitting issues and pull requests:

  1. Code changes must be functionally equivalent on GCP backends, i.e. not result in changed output
  2. Increases to cost and runtime on GCP backends should be minimal
  3. Avoid adding new inputs and tasks to workflows. Simpler changes are more likely to be approved, e.g. small in-line changes to scripts or WDL task command sections
  4. Avoid introducing new code paths, e.g. conditional statements
  5. Additional backend-specific scripts, workflows, tests, and Dockerfiles will not be approved
  6. Changes to Dockerfiles may require extensive testing before approval

We still encourage members of the community to adapt GATK-SV for non-GCP backends and share code on forked repositories. Here are a some considerations:

  • Refer to Cromwell's documentation for configuration instructions.

  • The handling and ordering of glob commands may differ between platforms.

  • Shell commands that are potentially destructive to input files (e.g. rm, mv, tabix) can cause unexpected behavior on shared filesystems. Enabling copy localization may help to more closely replicate the behavior on GCP.

  • For clusters that do not support Docker, Singularity is an alternative. See Cromwell documentation on Singularity.

  • The GATK-SV pipeline takes advantage of the massive parallelization possible in the cloud. Local backends may not have the resources to execute all of the workflows. Workflows that use fewer resources or that are less parallelized may be more successful. For instance, some users have been able to run GatherSampleEvidence on a SLURM cluster.