GenerateBatchMetrics

WDL source code

Analyzes variants for RD, BAF, PE, and SR evidence and creates a table of metrics containing raw and statistical metrics. These results are used to assess variant quality in FilterBatch and for SR-based breakpoint refinement.

Modified tests are applied to common variants (carrier frequency at least 50%) and results are emitted in a separate table.

The following diagram illustrates the recommended invocation order:

Inputs

`batch`

An identifier for the batch. Should match the name used in GatherBatchEvidence.

`*_vcf`

Clustered VCFs from ClusterBatch.

`baf_metrics`

Merged BAF evidence file from GatherBatchEvidence.

`discfile`

Merged PE evidence file from GatherBatchEvidence.

`coveragefile`

Merged RD evidence file from GatherBatchEvidence.

`splitfile`

Merged SR evidence file from GatherBatchEvidence.

`medianfile`

Merged median coverage table from GatherBatchEvidence.

`*_split_size`

Variants per shard for each evidence testing subworkflow. Reduce defaults to increase parallelism if the workflow is too slow.

`ped_file`

Family structures and sex assignments determined in EvidenceQC. See PED file format.

Optional `outlier_sample_ids`

Provide a file containing sample IDs, delimited by new lines, to exclude from the generation of batch metrics - this should can be based on outlier samples identified following EvidenceQC that are still retained in the cohort. If provided, the workflow discards any outlier samples from being considered as part of the called samples when calculating metrics at a given site, as long as non-outlier samples are also called for that site. It does the same for the set of background samples considered in the metric calculations for a given site. These outlier samples are not removed from joint calling in downstream modules.

Outputs

`metrics`

TSV of variant metrics (excluding common variants).

`metrics_common`

TSV of common variant metrics (>50% carrier frequency).

Inputs​

batch​

*_vcf​

baf_metrics​

discfile​

coveragefile​

splitfile​

medianfile​

*_split_size​

ped_file​

Optional outlier_sample_ids​

Outputs​

metrics​

metrics_common​