Skip to main content

GatherSampleEvidence

WDL source code

Runs raw evidence collection (PE/SR/RD/SD) on each sample and performs SV discovery with the following callers: Manta, Wham, and Scramble. For guidance on pre-filtering prior to GatherSampleEvidence, refer to the Input data section.

note

MELT is no longer supported as a raw caller. Please see SV/CNV callers for more information.

The following diagram illustrates the recommended invocation order:

Inputs

bam_or_cram_file

An indexed BAM or CRAM file aligned to hg38. See input data requirements.

sample_id

Identifier string for the sample. Refer to the sample ID requirements for specifications of allowable sample IDs. IDs that do not meet these requirements may lead to errors.

Optional is_dragen_3_7_8

Default: detect automtically from BAM/CRAM header. The header check can be skippped by setting this parameter when it is known whether the BAM/CRAM is aligned with Dragen v3.7.8. If this is true and Scramble is configured to run then soft-clipped reads at sites called by Scramble in the original alignments will be realigned with BWA for re-calling with Scramble.

Optional collect_coverage

Default: true. Collect read depth.

Optional collect_pesr

Default: true. Collect paired-end (PE) split-read (SR), and site depth (SD) evidence.

Optional manta_docker

Manta docker image. If provided, runs the Manta tool.

Optional melt_docker

MELT docker image. If provided, runs the MELT tool.

Optional scramble_docker

Scramble docker image. If provided, runs the Scramble tool.

Optional wham_docker

Wham docker image. If provided, runs the Wham tool.

Optional reference_bwa_*

BWA-mem index files. Required only if running Scramble and the input reads are aligned to Dragen v3.7.8

Optional scramble_alignment_score_cutoff

Default: 60 for Dragen v3.7.8 and 90 otherwise. Minimum alignment score for consensus sequences again the MEI reference in the Scramble tool. The default value is set automatically depending on aligner. Can be overridden to tune sensitivity.

Optional scramble_percent_align_cutoff

Default: 70. Minimum alignment percent for consensus sequences again the MEI reference in the Scramble tool. Can be overridden to tune sensitivity.

Optional scramble_min_clipped_reads_fraction

Default: 0.22. Minimum number of soft-clipped reads required for site cluster identification in the Scramble tool, as a fraction of average read depth. Can be overridden to tune sensitivity.

Advanced parameters

Optional run_localize_reads

Default: false. Copy input alignment files to the execution bucket before localizing to subsequent tasks. This may be desirable when BAM/CRAM files are stored in a requester-pays bucket or in another region to avoid egress charges.

warning

Enabling run_localize_reads can incur high storage costs. If using, make sure to clean up execution directories after the workflow finishes running.

Optional run_module_metrics

Default: true. Calculate QC metrics for the sample. If true, primary_contigs_fai must also be provided, and optionally the baseline_*_vcf inputs to run comparisons.

Optional move_bam_or_cram_files

Default: false. Uses mv instead of cp when operating on local CRAM/BAM files in some tasks. This can result in some performance improvement.

warning

Do not use move_bam_or_cram_files if running with a local backend or shared filesystem, as it may cause loss of input data.

Outputs

Optional manta_vcf

VCF containing variants called by Manta. Enabled by providing manta_docker.

Optional melt_vcf

VCF containing variants called by MELT. Enabled by providing melt_docker.

Optional scramble_vcf

VCF containing variants called by Scramble. Enabled by providing scramble_docker.

Optional wham_vcf

VCF containing variants called by Wham. Enabled by providing wham_docker.

Optional coverage_counts

Binned read counts collected by GATK-CollectReadCounts (*.counts.tsv.gz). Enabled with collect_coverage.

Optional pesr_disc

Discordant read pairs collected by GATK-CollectSVEvidence (*.pe.txt.gz). Enabled with collect_pesr.

Optional pesr_split

Split read positions collected by GATK-CollectSVEvidence (*.sr.txt.gz). Enabled with collect_pesr.

Optional pesr_sd

Site depth counts collected by GATK-CollectSVEvidence (*.sd.txt.gz). Enabled with collect_pesr.

Optional sample_metrics_files

Sample metrics for QC. Enabled with run_module_metrics.