GenotypeBatch
Genotypes a batch of samples across all variants in the cohort. Note that while the preceding step MergeBatchSites is a "cohort-level" module, genotyping is performed on one batch of samples at a time.
In brief, genotyping is performed by first training variant metric cutoffs on sites with clear evidence signatures, and then genotypes and genotype qualities are assigned based on parametric models tuned with these cutoffs. This is performed separately for PE/SR calls and depth-based calls.
The following diagram illustrates the recommended invocation order:
Inputs
A number of inputs to this module are only used in single-sample mode and therefore omitted here. In addition, some inputs marked as optional in the WDL are required for joint calling.
batch
An identifier for the batch. Should match the name used in GatherBatchEvidence.
batch_pesr_vcf
Batch PE/SR caller variants after filtering, generated in FilterBatch.
batch_depth_vcf
Batch depth caller variants after filtering, generated in FilterBatch.
cohort_pesr_vcf
Merged PE/SR caller variants for the cohort, generated in MergeBatchSites.
cohort_depth_vcf
Merged depth caller variants for the cohort, generated in MergeBatchSites.
n_per_split
Records per shard when scattering variants. Decrease to increase parallelism if the workflow is running slowly.
coveragefile
Merged RD evidence file from GatherBatchEvidence.
medianfile
Merged median coverage table from GatherBatchEvidence.
rf_cutoffs
Genotyping cutoffs trained with the random forest filtering model from FilterBatch.
seed_cutoffs
See here.
n_RD_genotype_bins
Number of depth genotyping bins. Most users should leave this at the default value.
discfile
Merged PE evidence file from GatherBatchEvidence.
reference_build
Reference build version. Only "hg38" is supported.
sr_median_hom_ins
Median normalized split read counts of homozygous insertions. Most users should leave this at the default value.
sr_hom_cutoff_multiplier
Cutoff multiplier for split read counts of homozygous insertions. Most users should leave this at the default value.
Outputs
sr_bothside_pass
List of variant IDs with split reads found on both sides of the breakpoint.
sr_background_fail
List of variant IDs exhibiting low signal-to-noise ratio for split read evidence.
trained_PE_metrics
PE evidence genotyping metrics file.
trained_SR_metrics
SR evidence genotyping metrics file.
trained_genotype_*_*_sepcutoff
Trained genotyping cutoffs for variants called by PESR or depth when supported by PESR or depth evidence (4 total).
genotyped_depth_vcf
Genotyped depth call VCF.
genotyped_pesr_vcf
Genotyped PE/SR call VCF.
regeno_coverage_medians
Coverage metrics for downstream CNV regenotyping.