Skip to main content

CombineBatches

WDL source code

Merges variants across multiple batches. Variant merging uses similar methods and criteria as in ClusterBatch, but in addition requires samples genotyped as non-reference to match sufficiently.

The following diagram illustrates the recommended invocation order:

Inputs

info

All array inputs of batch data must match in order. For example, the order of the batches array should match that of pesr_vcfs, depth_vcfs, etc.

cohort_name

Cohort name. The guidelines outlined in the sample ID requirements section apply here.

batches

Array of batch identifiers. Should match the name used in GatherBatchEvidence. Order must match that of depth_vcfs.

Optional merge_vcfs

Default: false. If true, merge contig-sharded VCFs into one genome-wide VCF. This may be used for convenience but cannot be used with downstream workflows.

pesr_vcfs

Array of genotyped depth caller variants for all batches, generated in GenotypeBatch.

depth_vcfs

Array of re-genotyped depth caller variants for all batches, generated in RegenotypeCNVs.

raw_sr_bothside_pass_files

Array of variant lists with bothside SR support for all batches, generated in GenotypeBatch.

raw_sr_background_fail_files

Array of variant lists with low SR signal-to-noise ratio for all batches, generated in GenotypeBatch.

localize_shard_size

Shard size for parallel computations. Decreasing this parameter may help reduce run time.

min_sr_background_fail_batches

Threshold fraction of batches with high SR background for a given variant required in order to assign this HIGH_SR_BACKGROUND flag. Most users should leave this at the default value.

Optional use_hail

Default: false. Use Hail for VCF concatenation. This should only be used for projects with over 50k samples. If enabled, the gcs_project must also be provided. Does not work on Terra.

Optional gcs_project

Google Cloud project ID. Required only if enabling use_hail.

Outputs

combined_vcfs

Array of contig-sharded VCFs of combined PE/SR and depth calls.

cluster_bothside_pass_lists

Array of contig-sharded bothside SR support variant lists.

cluster_background_fail_lists

Array of contig-sharded high SR background variant lists.

combine_batches_merged_vcf

Genome-wide VCF of combined PE/SR and depth calls. Only generated if using merge_vcfs.