SRJointCallGVCFsWithGenomicsDB_Pf_Niare_VETS

SRJointCallGVCFsWithGenomicsDB_Pf_Niare_VQSR

author
Jonn Smith
description
This workflow implements a modified version of the joint calling pipeline from Niare et al. (https://doi.org/10.1186/s12936-023-04632-0) using LRMA conventions. The modification is that this pipeline uses VETS instead of VQSR.

Inputs

Required

  • gcs_out_root_dir (String, required): GCS Bucket into which to finalize outputs.
  • gvcf_indices (Array[File], required): Array of gvcf index files for gvcfs. Order should correspond to that in gvcfs.
  • gvcfs (Array[File], required): Array of GVCF files to use as inputs for joint calling.
  • indel_is_calibration (Array[Boolean], required): Array of booleans indicating which files in indel_known_reference_variants should be used as calibration sets. True ->calibration set. False -> NOT a calibration set.
  • indel_is_training (Array[Boolean], required): Array of booleans indicating which files in indel_known_reference_variants should be used as training sets. True -> training set. False -> NOT a training set.
  • indel_known_reference_variants (Array[File], required): Array of VCF files to use as input reference variants for INDELs. Each can be designated as either calibration or training using indel_is_training and indel_is_calibration.
  • indel_known_reference_variants_identifier (Array[File], required): Array of names to give to the VCF files given in indel_known_reference_variants. Order should correspond to that in indel_known_reference_variants.
  • indel_known_reference_variants_index (Array[File], required): Array of VCF index files for indel_known_reference_variants. Order should correspond to that in indel_known_reference_variants.
  • prefix (String, required): Prefix to use for output files.
  • ref_map_file (File, required): Reference map file indicating reference sequence and auxillary file locations
  • snp_is_calibration (Array[Boolean], required): Array of booleans indicating which files in snp_known_reference_variants should be used as calibration sets. True ->calibration set. False -> NOT a calibration set.
  • snp_is_training (Array[Boolean], required): Array of booleans indicating which files in snp_known_reference_variants should be used as training sets. True -> training set. False -> NOT a training set.
  • snp_known_reference_variants (Array[File], required): Array of VCF files to use as input reference variants for SNPs. Each can be designated as either calibration or training using snp_is_training and snp_is_calibration.
  • snp_known_reference_variants_identifier (Array[File], required): Array of names to give to the VCF files given in snp_known_reference_variants. Order should correspond to that in snp_known_reference_variants.
  • snp_known_reference_variants_index (Array[File], required): Array of VCF index files for snp_known_reference_variants. Order should correspond to that in snp_known_reference_variants.
  • SplitContigToIntervals.prefix (String, required): Prefix to use for output files.
  • SplitContigToIntervals.ref_fasta (File, required)
  • SplitContigToIntervals.ref_fasta_fai (File, required)

Optional

  • annotation_bed_file_annotation_names (Array[String]?): Array of names/FILTER column entries to use for each given file in annotation_bed_files. Order should correspond to annotation_bed_files.
  • annotation_bed_file_indexes (Array[File]?): Array of bed indexes for annotation_bed_files. Order should correspond to annotation_bed_files.
  • annotation_bed_files (Array[File]?): Array of bed files to use to FILTER/annotate variants in the output file. Annotations will be placed in the FILTER column, effectively filtering variants that overlap these regions.
  • AnnotateVcfRegions.runtime_attr_override (RuntimeAttr?)
  • CreateSampleNameMap.runtime_attr_override (RuntimeAttr?)
  • ExtractIndelVariantAnnotations.runtime_attr_override (RuntimeAttr?)
  • ExtractSnpVariantAnnotations.runtime_attr_override (RuntimeAttr?)
  • FinalizeVETSTBI.name (String?)
  • FinalizeVETSTBI.runtime_attr_override (RuntimeAttr?)
  • FinalizeVETSVCF.name (String?)
  • FinalizeVETSVCF.runtime_attr_override (RuntimeAttr?)
  • GatherRescoredVcfs.runtime_attr_override (RuntimeAttr?)
  • GatherSitesOnlyVCFs.runtime_attr_override (RuntimeAttr?)
  • GatherVcfs.runtime_attr_override (RuntimeAttr?)
  • GenomicsDbImport.runtime_attr_override (RuntimeAttr?)
  • GenotypeGVCFs.input_gvcf_index (File?)
  • GenotypeGVCFs.runtime_attr_override (RuntimeAttr?)
  • MakeChrIntervalList.runtime_attr_override (RuntimeAttr?)
  • MakeSitesOnlyVCF.runtime_attr_override (RuntimeAttr?)
  • ScoreIndelVariantAnnotations.runtime_attr_override (RuntimeAttr?)
  • ScoreSnpVariantAnnotations.runtime_attr_override (RuntimeAttr?)
  • SplitContigToIntervals.runtime_attr_override (RuntimeAttr?)
  • TrainIndelVariantAnnotationsModel.runtime_attr_override (RuntimeAttr?)
  • TrainIndelVariantAnnotationsModel.unlabeled_annotation_hdf5 (File?)
  • TrainSnpVariantAnnotationsModel.runtime_attr_override (RuntimeAttr?)
  • TrainSnpVariantAnnotationsModel.unlabeled_annotation_hdf5 (File?)

Defaults

  • indel_calibration_sensitivity (Float, default=0.99): VETS (ScoreVariantAnnotations) parameter - score below which INDEL variants will be filtered.
  • indel_max_unlabeled_variants (Int, default=0): VETS (ExtractVariantAnnotations) parameter - maximum number of unlabeled INDEL variants/alleles to randomly sample with reservoir sampling. If nonzero, annotations will also be extracted from unlabeled sites.
  • indel_recalibration_annotation_values (Array[String], default=["BaseQRankSum", "ExcessHet", "FS", "HAPCOMP", "HAPDOM", "HEC", "MQ", "MQRankSum", "QD", "ReadPosRankSum", "SOR", "DP"]): VETS (ScoreSnpVariantAnnotations/ScoreVariantAnnotations) parameter - Array of annotation names to use to create the INDEL variant scoring model and over which to score INDEL variants.
  • snp_calibration_sensitivity (Float, default=0.99): VETS (ScoreVariantAnnotations) parameter - score below which SNP variants will be filtered.
  • snp_max_unlabeled_variants (Int, default=0): VETS (ExtractVariantAnnotations) parameter - maximum number of unlabeled SNP variants/alleles to randomly sample with reservoir sampling. If nonzero, annotations will also be extracted from unlabeled sites.
  • snp_recalibration_annotation_values (Array[String], default=["BaseQRankSum", "ExcessHet", "FS", "HAPCOMP", "HAPDOM", "HEC", "MQ", "MQRankSum", "QD", "ReadPosRankSum", "SOR", "DP"]): VETS (ScoreSnpVariantAnnotations/ScoreVariantAnnotations) parameter - Array of annotation names to use to create the SNP variant scoring model and over which to score SNP variants.
  • CreateSampleNameMap.background_sample_gvcfs (Array[File], default=[])
  • GenomicsDbImport.batch_size (Int, default=100)
  • GenotypeGVCFs.batch_size (Int, default=100)
  • MakeChrIntervalList.filter (Array[String], default=['random', 'chrUn', 'decoy', 'alt', 'HLA', 'EBV'])
  • SplitContigToIntervals.size (Int, default=200000)
  • TrainIndelVariantAnnotationsModel.calibration_sensitivity_threshold (Float, default=0.95)
  • TrainSnpVariantAnnotationsModel.calibration_sensitivity_threshold (Float, default=0.95)

Outputs

  • joint_recalibrated_vcf (File)
  • joint_recalibrated_vcf_tbi (File)

Dot Diagram

SRJointCallGVCFsWithGenomicsDB_Pf_Niare_VETS