Building inputs
Each workflow of the GATK-SV pipeline takes a unique set of arguments as inputs. You have different options for configuring them depending on the platform you're using to run the pipeline. For instance, you may use Terra workspaces if you're running on Terra (user-friendly), or JSON files if you're running on Cromwell (for development and advanced use-cases). For each workflow, we provide example configurations that help both in setting up your own Terra workspace or for testing purposes with sample data. You may run the following commands to get these example inputs.
-
Clone GATK-SV (you may skip this step if you have already done so).
git clone https://github.com/broadinstitute/gatk-sv && cd gatk-sv
-
Create test inputs.
bash scripts/inputs/build_default_inputs.sh -d .
Running this command generates test inputs in
gatk-sv/inputs/build
with the following structure.inputs/build
├── NA12878
│ ├── terra
│ └── test
├── NA19240
│ └── test
├── hgdp
│ └── test
└── ref_panel_1kg
├── terra
└── test
Building inputs for specific use-cases (Advanced)
Build for batched workflows
python scripts/inputs/build_inputs.py \
inputs/values \
inputs/templates/test/GATKSVPipelineSingleSample \
inputs/build/NA19240/test \
-a '{ "test_batch" : "ref_panel_1kg" }'
Generating a reference panel
This section only applies to the single-sample mode.
New reference panels can be generated from a single run of the
GATKSVPipelineBatch
workflow.
If using a Cromwell server, we recommend copying the outputs to a
permanent location by adding the following option to the
workflow configuration
file:
"final_workflow_outputs_dir" : "gs://my-outputs-bucket",
"use_relative_output_paths": false,
Here is an example of how to generate workflow input jsons from GATKSVPipelineBatch
workflow metadata:
-
Get metadata from Cromwshell.
cromshell -t60 metadata 38c65ca4-2a07-4805-86b6-214696075fef > metadata.json
-
Run the script.
python scripts/inputs/create_test_batch.py \
--execution-bucket gs://my-exec-bucket \
--final-workflow-outputs-dir gs://my-outputs-bucket \
metadata.json \
> inputs/values/my_ref_panel.json -
Build test files for batched workflows (google cloud project id required).
python scripts/inputs/build_inputs.py \
inputs/values \
inputs/templates/test \
inputs/build/my_ref_panel/test \
-a '{ "test_batch" : "ref_panel_1kg" }' -
Build test files for the single-sample workflow
python scripts/inputs/build_inputs.py \
inputs/values \
inputs/templates/test/GATKSVPipelineSingleSample \
inputs/build/NA19240/test_my_ref_panel \
-a '{ "single_sample" : "test_single_sample_NA19240", "ref_panel" : "my_ref_panel" }' -
Build files for a Terra workspace.
python scripts/inputs/build_inputs.py \
inputs/values \
inputs/templates/terra_workspaces/single_sample \
inputs/build/NA12878/terra_my_ref_panel \
-a '{ "single_sample" : "test_single_sample_NA12878", "ref_panel" : "my_ref_panel" }'
Note that the inputs to GATKSVPipelineBatch
may be used as resources
for the reference panel and therefore should also be in a permanent location.