Skip to main content

RefineComplexVariants

WDL source code

Refines complex SVs and translocations and filters based on discordant read pair and read depth evidence reassessment.

The following diagram illustrates the recommended invocation order:

Inputs

info

All array inputs of batch data must match in order. For example, the order of the batch_name_list array should match that of batch_sample_lists, PE_metrics, etc.

vcf

Input vcf, generated in CleanVcf.

prefix

Prefix for output VCF, such as the cohort name. May be alphanumeric with underscores.

batch_name_list

Array of batch names. These should be the same batch names used in GatherBatchEvidence.

batch_sample_lists

Array of sample ID lists for all batches, generated in FilterBatch. Order must match batch_name_list.

PE_metrics

Array of PE metrics files for all batches, generated in GatherBatchEvidence. Order must match batch_name_list.

Depth_DEL_beds, Depth_DUP_beds

Arrays of raw DEL and DUP depth calls for all batches, generated in GatherBatchEvidence. Order must match batch_name_list.

n_per_split

Shard size for parallel computations. Decreasing this parameter may help reduce run time.

Optional min_pe_cpx

Default: 3. Minimum PE read count for complex variants (CPX).

Optional min_pe_ctx

Default: 3. Minimum PE read count for translocations (CTX).

Optional use_hail

Default: false. Use Hail for VCF concatenation. This should only be used for projects with over 50k samples. If enabled, the gcs_project must also be provided. Does not work on Terra.

Optional gcs_project

Google Cloud project ID. Required only if enabling use_hail.

Outputs

cpx_refined_vcf

Output VCF.

cpx_evidences

Supplementary output table of complex variant evidence.