Skip to main content

GenotypeComplexVariants

WDL source code

Genotypes, filters, and classifies putative complex variants using depth evidence.

The following diagram illustrates the recommended invocation order:

Inputs

info

Some inputs of batch data must match in order. Specifically, the order of the batches array should match that of depth_vcfs, bincov_files, depth_gt_rd_sep_files, and median_coverage_files.

cohort_name

Cohort name. The guidelines outlined in the sample ID requirements section apply here.

batches

Array of batch identifiers. Should match the name used in GatherBatchEvidence.

ped_file

Family structures and sex assignments determined in EvidenceQC. See PED file format.

depth_vcfs

Array of re-genotyped depth caller variants for all batches, generated in RegenotypeCNVs. Must match order of batches.

Optional merge_vcfs

Default: false. If true, merge contig-sharded VCFs into one genome-wide VCF. This may be used for convenience but cannot be used with downstream workflows.

Optional localize_shard_size

Default: 50000. Shard size for parallel computations. Decreasing this parameter may help reduce run time.

complex_resolve_vcfs

Array of contig-sharded VCFs containing putative complex variants, generated in ResolveComplexVariants.

bincov_files

Array of RD evidence files for all batches from GatherBatchEvidence. Must match order of batches.

depth_gt_rd_sep_files

Array of "depth_depth" genotype cutoff files (depth evidence for depth-based calls) generated in GenotypeBatch. Order must match that of batches.

median_coverage_files

Array of median coverage tables for all batches from GatherBatchEvidence. Order must match that of batches.

Optional use_hail

Default: false. Use Hail for VCF concatenation. This should only be used for projects with over 50k samples. If enabled, the gcs_project must also be provided. Does not work on Terra.

Optional gcs_project

Google Cloud project ID. Required only if enabling use_hail.

Outputs

complex_genotype_vcfs

Array of contig-sharded VCFs containing fully resolved and genotyped complex variants.

complex_genotype_merged_vcf

Genome-wide output VCF. Only generated if using merge_vcfs.