BenchmarkVCFs

Benchmark

description
A workflow to calculate sensitivity and precision of a germline variant calling pipeline by comparing a 'call' vcf produced by the pipeline to a gold standard 'truth' vcf. Allows for stratification based on interval lists, bed files, or variant types defined according to GATK SelectVariants. Borrowed and adapted from the Broad Institute's Hydrogen/Palantir repo, courtesy of Michael Gatzen (https://github.com/broadinstitute/palantir-workflows/tree/mg_benchmark_compare/BenchmarkVCFs ; permalink: https://github.com/broadinstitute/palantir-workflows/blob/0bf48efc6de818364993e46d89591a035cfd80c7/BenchmarkVCFs/BenchmarkVCFs.wdl).

Inputs

Required

  • confidenceInterval (File, required); description: confidence interval for truth set (can be bed or picard interval_list)
  • evalLabel (String, required); description: label to identify vcf to be evaluated
  • evalVcf (File, required); description: vcfs to be evaluated
  • evalVcfIndex (File, required); description: vcf index for evalVcf
  • ref_map_file (File, required); description: table indicating reference sequence and auxillary file locations
  • truthLabel (String, required); description: label by which to indentify truth set
  • truthVcf (File, required); description: truth vcf against which to evaluate
  • truthVcfIndex (File, required)

Optional

  • analysisRegion (String?); description: if provided (gatk format, single interval e.g., 'chr20', or 'chr20:1-10') all the analysis will be performed only within the region.
  • annotationNames (Array[String]?); description: Annotation arguments to GATK (-A argument, multiple OK)
  • dummyInputForTerraCallCaching (String?); description: When running on Terra, use workspace.name as this input to ensure that all tasks will only cache hit to runs in your own workspace. This will prevent call caching from failing with 'Cache Miss (10 failed copy attempts)'. Outside of Terra this can be left empty. This dummy input is only needed for tasks that have no inputs specific to the sample being run (such as CreateIntervalList which does not take in any sample data).
  • evalBam (File?); description: bam file contaning the reads that generated the evalVcf
  • evalBamLabel (String?); description: label to use for the evalBam in IGV
  • gatkJarForAnnotation (File?); description: GATK jar that can calculate necessary annotations for jexl Selections when using VCFEval.
  • hapMap (File?); description: reference haplotype map for CrosscheckFingerprints
  • jexlVariantSelectors (Array[String]?); description: variant types to select over (defined by jexl fed to GATK SelectVariants)
  • preemptible (Int?)
  • threadsVcfEval (Int?)
  • truthBam (File?); description: bam file contaning the reads that generated the truthVcf
  • truthBamLabel (String?); description: label to use for the truthBam in IGV
  • variantSelectorLabels (Array[String]?); description: labels by which to identify variant selectors (must be same length as jexlVariantSelectors)
  • vcfScoreField (String?); description: Have vcfEval use this field for making the roc-plot. If this is an info field (like VSQLOD) it should be provided as INFO.VQSLOD, otherewise it is assumed to be a format field.
  • CheckForVariantsEval.memoryMaybe (Int?)
  • CheckForVariantsTruth.memoryMaybe (Int?)
  • ConfidenceConvertIntervals.memoryMaybe (Int?)
  • CountUNKVcfEval.memoryMaybe (Int?)
  • EvalIndelLengthVcfEval.memoryMaybe (Int?)
  • EvalSelectorVcfEval.memoryMaybe (Int?)
  • Match.memoryMaybe (Int?)
  • StandardVcfEval.memUser (String?)
  • StratConvertIntervals.memoryMaybe (Int?)

Defaults

  • doIndelLengthStratification (Boolean, default=true); description: whether or not to perform stratification by indel length
  • enableRefOverlap (Boolean, default=false)
  • gatkTag (String, default="4.0.11.0"); description: version of gatk docker to use. Defaults to 4.0.11.0
  • passingOnly (Boolean, default=true); description: Have vcfEval only consider the passing variants
  • requireMatchingGenotypes (Boolean, default=true); description: whether to require genotypes to match in order to be a true positive
  • stratIntervals (Array[File], default=[]); description: intervals for stratifiction (can be picard interval_list or bed format)
  • stratLabels (Array[String], default=[]); description: labels by which to identify stratification intervals (must be same length as stratIntervals)
  • truthIsSitesOnlyVcf (Boolean, default=false); description: whether the truth VCF is a sites-only VCF file without any sample information

Outputs

  • summary (File)
  • snpPrecision (Float)
  • indelPrecision (Float)
  • snpRecall (Float)
  • indelRecall (Float)
  • snpF1Score (Float)
  • indelF1Score (Float)
  • snpRocs (Array[File?])
  • nonSnpRocs (Array[File?])

Dot Diagram

BenchmarkVCFs