GatherSampleEvidence
Runs raw evidence collection (PE/SR/RD/SD) on each sample and performs SV discovery with the following callers: Manta, Wham, and Scramble. For guidance on pre-filtering prior to GatherSampleEvidence, refer to the Input data section.
MELT is no longer supported as a raw caller. Please see SV/CNV callers for more information.
The following diagram illustrates the recommended invocation order:
Inputs
bam_or_cram_file
An indexed BAM or CRAM file aligned to hg38. See input data requirements.
sample_id
Identifier string for the sample. Refer to the sample ID requirements for specifications of allowable sample IDs. IDs that do not meet these requirements may lead to errors.
Optional is_dragen_3_7_8
Default: detect automtically from BAM/CRAM header. The header check can be skippped by setting this parameter when it is known whether the BAM/CRAM is aligned with Dragen v3.7.8. If this is true and Scramble is configured to run then soft-clipped reads at sites called by Scramble in the original alignments will be realigned with BWA for re-calling with Scramble.
Optional collect_coverage
Default: true
. Collect read depth.
Optional collect_pesr
Default: true
. Collect paired-end (PE) split-read (SR), and site depth (SD) evidence.
Optional manta_docker
Manta docker image. If provided, runs the Manta tool.
Optional melt_docker
MELT docker image. If provided, runs the MELT tool.
Optional scramble_docker
Scramble docker image. If provided, runs the Scramble tool.
Optional wham_docker
Wham docker image. If provided, runs the Wham tool.
Optional reference_bwa_*
BWA-mem index files. Required only if running Scramble and the input reads are aligned to Dragen v3.7.8
Optional scramble_alignment_score_cutoff
Default: 60
for Dragen v3.7.8 and 90
otherwise. Minimum alignment score for consensus sequences again the MEI reference
in the Scramble tool. The default value is set automatically depending on aligner. Can be overridden to tune
sensitivity.
Optional scramble_percent_align_cutoff
Default: 70
. Minimum alignment percent for consensus sequences again the MEI reference in the Scramble tool. Can be
overridden to tune sensitivity.
Optional scramble_min_clipped_reads_fraction
Default: 0.22
. Minimum number of soft-clipped reads required for site cluster identification in the Scramble tool,
as a fraction of average read depth. Can be overridden to tune sensitivity.
Advanced parameters
Optional run_localize_reads
Default: false
. Copy input alignment files to the execution bucket before localizing to subsequent tasks. This
may be desirable when BAM/CRAM files are stored in a requester-pays bucket or in another region to avoid egress charges.
Enabling run_localize_reads
can incur high storage costs. If using, make sure to clean up execution directories after
the workflow finishes running.
Optional run_module_metrics
Default: true
. Calculate QC metrics for the sample. If true, primary_contigs_fai
must also be provided, and
optionally the baseline_*_vcf
inputs to run comparisons.
Optional move_bam_or_cram_files
Default: false
. Uses mv
instead of cp
when operating on local CRAM/BAM files in some tasks. This can result in
some performance improvement.
Do not use move_bam_or_cram_files
if running with a local backend or shared filesystem, as it may cause loss of
input data.
Outputs
Optional manta_vcf
VCF containing variants called by Manta. Enabled by providing manta_docker.
Optional melt_vcf
VCF containing variants called by MELT. Enabled by providing melt_docker.
Optional scramble_vcf
VCF containing variants called by Scramble. Enabled by providing scramble_docker.
Optional wham_vcf
VCF containing variants called by Wham. Enabled by providing wham_docker.
Optional coverage_counts
Binned read counts collected by GATK-CollectReadCounts
(*.counts.tsv.gz
). Enabled with collect_coverage.
Optional pesr_disc
Discordant read pairs collected by GATK-CollectSVEvidence
(*.pe.txt.gz
). Enabled with collect_pesr.
Optional pesr_split
Split read positions collected by GATK-CollectSVEvidence
(*.sr.txt.gz
). Enabled with collect_pesr.
Optional pesr_sd
Site depth counts collected by GATK-CollectSVEvidence
(*.sd.txt.gz
). Enabled with collect_pesr.
Optional sample_metrics_files
Sample metrics for QC. Enabled with run_module_metrics.