CompareVcfBenchmarks

CompareVcfBenchmarks

description
The purpose of this workflow is to create a table to facilitate a comparison of precision and sensitivity between different configurations (e.g. pipeline versions, chemistry changes) of samples vs. truth that have been obtained with the BenchmarkVCFs workflow. After the CompareBenchmarks workflow is run, you can export the generated CSV table into a Google Sheets spreadsheet with automatic formatting using the ExportToGoogleSheets Colab Notebook (https://github.com/broadinstitute/palantir-workflows/blob/mg_benchmark_compare/BenchmarkVCFs/README_CompareBenchmarks.md).

Inputs

Required

  • benchmark_summaries (Array[File], required): The output summary.tsv files from the BenchmarkVCFs workflow.
  • configurations (Array[String], required): The labels for the different configurations that should be compared to each other.
  • sample_ids (Array[String], required): The names of one or more different samples. The comparisons will be made between configurations for each sample individually.

Optional

  • deltas (Array[Int]?): If specified, the output table will contain columns that compares the results of different configurations to each other. Each delta column is defined by two entries in the array (referenced by zero-based index). If, for example there are three configurations A, B and C and you want to compare configurations B to A and C to A, provide the following data: [1, 0, 2, 0]. Note that you will want to define the order_of_configurations in this case to make sure that the indices refer to the correct configurations.
  • mem_gb (Int?): Optional input overriding the default memory.
  • order_of_configurations (Array[String]?): This input determines the order of the configurations in the resulting table. Just as above, each configuration only has to be specified once, not once for each input VCF. If not specified, the order will be determined by the input files.
  • order_of_samples (Array[String]?): If multiple different sample names are provided you can specify the order of those samples in the resulting table. Here, each sample name only has to be specified once, not once for each input VCF. If not specified, the order will be determined by the input files.
  • preemptible (Int?): Optional input overriding the default number of preemptible attempts.
  • stratifiers (Array[String]?): This input requires the same labels as the stratLabels input that has been passed to BenchmarkVCFs.

Defaults

  • generate_gc_plots (Boolean, default=false): If set to true, will generate plots of GC.
  • include_counts (Boolean, default=true): If set to false, the resulting metrics will be Sensitivity, Precision and F-Measure. If set to true, the output table will also include the number of TP, FP and FN variants for each stratifier.

Outputs

  • comparison_csv (File)
  • raw_data (File)
  • gc_plots (Array[File]?)

Dot Diagram

CompareVcfBenchmarks