BenchmarkVCFs
Benchmark
- description
- A workflow to calculate sensitivity and precision of a germline variant calling pipeline by comparing a 'call' vcf produced by the pipeline to a gold standard 'truth' vcf. Allows for stratification based on interval lists, bed files, or variant types defined according to GATK SelectVariants. Borrowed and adapted from the Broad Institute's Hydrogen/Palantir repo, courtesy of Michael Gatzen (https://github.com/broadinstitute/palantir-workflows/tree/mg_benchmark_compare/BenchmarkVCFs ; permalink: https://github.com/broadinstitute/palantir-workflows/blob/0bf48efc6de818364993e46d89591a035cfd80c7/BenchmarkVCFs/BenchmarkVCFs.wdl).
Inputs
Required
confidenceInterval
(File, required); description: confidence interval for truth set (can be bed or picard interval_list)evalLabel
(String, required); description: label to identify vcf to be evaluatedevalVcf
(File, required); description: vcfs to be evaluatedevalVcfIndex
(File, required); description: vcf index for evalVcfref_map_file
(File, required); description: table indicating reference sequence and auxillary file locationstruthLabel
(String, required); description: label by which to indentify truth settruthVcf
(File, required); description: truth vcf against which to evaluatetruthVcfIndex
(File, required)
Optional
analysisRegion
(String?); description: if provided (gatk format, single interval e.g., 'chr20', or 'chr20:1-10') all the analysis will be performed only within the region.annotationNames
(Array[String]?); description: Annotation arguments to GATK (-A argument, multiple OK)dummyInputForTerraCallCaching
(String?); description: When running on Terra, use workspace.name as this input to ensure that all tasks will only cache hit to runs in your own workspace. This will prevent call caching from failing with 'Cache Miss (10 failed copy attempts)'. Outside of Terra this can be left empty. This dummy input is only needed for tasks that have no inputs specific to the sample being run (such as CreateIntervalList which does not take in any sample data).evalBam
(File?); description: bam file contaning the reads that generated the evalVcfevalBamLabel
(String?); description: label to use for the evalBam in IGVgatkJarForAnnotation
(File?); description: GATK jar that can calculate necessary annotations for jexl Selections when using VCFEval.hapMap
(File?); description: reference haplotype map for CrosscheckFingerprintsjexlVariantSelectors
(Array[String]?); description: variant types to select over (defined by jexl fed to GATK SelectVariants)preemptible
(Int?)threadsVcfEval
(Int?)truthBam
(File?); description: bam file contaning the reads that generated the truthVcftruthBamLabel
(String?); description: label to use for the truthBam in IGVvariantSelectorLabels
(Array[String]?); description: labels by which to identify variant selectors (must be same length as jexlVariantSelectors)vcfScoreField
(String?); description: Have vcfEval use this field for making the roc-plot. If this is an info field (like VSQLOD) it should be provided as INFO.VQSLOD, otherewise it is assumed to be a format field.CheckForVariantsEval.memoryMaybe
(Int?)CheckForVariantsTruth.memoryMaybe
(Int?)ConfidenceConvertIntervals.memoryMaybe
(Int?)CountUNKVcfEval.memoryMaybe
(Int?)EvalIndelLengthVcfEval.memoryMaybe
(Int?)EvalSelectorVcfEval.memoryMaybe
(Int?)Match.memoryMaybe
(Int?)StandardVcfEval.memUser
(String?)StratConvertIntervals.memoryMaybe
(Int?)
Defaults
doIndelLengthStratification
(Boolean, default=true); description: whether or not to perform stratification by indel lengthenableRefOverlap
(Boolean, default=false)gatkTag
(String, default="4.0.11.0"); description: version of gatk docker to use. Defaults to 4.0.11.0passingOnly
(Boolean, default=true); description: Have vcfEval only consider the passing variantsrequireMatchingGenotypes
(Boolean, default=true); description: whether to require genotypes to match in order to be a true positivestratIntervals
(Array[File], default=[]); description: intervals for stratifiction (can be picard interval_list or bed format)stratLabels
(Array[String], default=[]); description: labels by which to identify stratification intervals (must be same length as stratIntervals)truthIsSitesOnlyVcf
(Boolean, default=false); description: whether the truth VCF is a sites-only VCF file without any sample information
Outputs
summary
(File)snpPrecision
(Float)indelPrecision
(Float)snpRecall
(Float)indelRecall
(Float)snpF1Score
(Float)indelF1Score
(Float)snpRocs
(Array[File?])nonSnpRocs
(Array[File?])