FilterGenotypes
Filter genotypes using the GQ model with recalibrated quality scores. The output VCF contains the HIGH_NCR field, which is a filter status assigned to variants exceeding a threshold proportion
of no-call genotypes. This will also be applied to variants with genotypes that have already been filtered in the input VCF.
The following diagram illustrates the recommended invocation order:
QC recommendations
We strongly recommend performing call set QC after this module. By default, QC plotting is enabled with the run_qc argument. Users should carefully inspect the main plots from the main_vcf_qc_tarball. Please see the MainVcfQc module documentation for more information on interpreting these plots and recommended QC criteria.
Inputs
vcf
Input VCF with recalibrated scores generated from ScoreGenotypes.
Optional output_prefix
Default: use input VCF filename. Prefix for the output VCF, such as the cohort name. May be alphanumeric with underscores.
ploidy_table
Table of sample ploidies generated in JoinRawCalls.
sl_cutoff_table
An argument for the SL filtering script which is used to set scaled logit (SL) cutoffs for filtering. Overridden by optimized_sl_cutoff_table.
Optional optimized_sl_cutoff_table
This is an output from the SL optimization script. This can be used to set SL cutoffs for filtering in a more truth-aware manner. Overrides sl_cutoff_table if passed.
Optional no_call_rate_cutoff
Default: 0.05. Threshold fraction of samples that must have no-call genotypes in order to filter a variant. Set to 1 to disable.
Optional sl_filter_args
Arguments for the SL filtering script.
Optional run_qc
Default: true. Enable running MainVcfQc automatically. By default, filtered variants will be excluded from
the plots.
Optional filter_vcf_records_per_shard
Default: 20000. Shard size for scattered GQ recalibration tasks. Decrease this if those steps are running slowly.
Outputs
filtered_vcf
Filtered VCF.
Optional main_vcf_qc_tarball
QC plots generated with MainVcfQc. Only generated if using run_qc.