SRJointCallGVCFsWithGenomicsDB_Pf_Niare_VETS

SRJointCallGVCFsWithGenomicsDB_Pf_Niare_VQSR

author: Jonn Smith
description: This workflow implements a modified version of the joint calling pipeline from Niare et al. (https://doi.org/10.1186/s12936-023-04632-0) using LRMA conventions. The modification is that this pipeline uses VETS instead of VQSR.

Inputs

Required

gcs_out_root_dir (String, required): GCS Bucket into which to finalize outputs.
gvcf_indices (Array[File], required): Array of gvcf index files for gvcfs. Order should correspond to that in gvcfs.
gvcfs (Array[File], required): Array of GVCF files to use as inputs for joint calling.
indel_is_calibration (Array[Boolean], required): Array of booleans indicating which files in indel_known_reference_variants should be used as calibration sets. True ->calibration set. False -> NOT a calibration set.
indel_is_training (Array[Boolean], required): Array of booleans indicating which files in indel_known_reference_variants should be used as training sets. True -> training set. False -> NOT a training set.
indel_known_reference_variants (Array[File], required): Array of VCF files to use as input reference variants for INDELs. Each can be designated as either calibration or training using indel_is_training and indel_is_calibration.
indel_known_reference_variants_identifier (Array[File], required): Array of names to give to the VCF files given in indel_known_reference_variants. Order should correspond to that in indel_known_reference_variants.
indel_known_reference_variants_index (Array[File], required): Array of VCF index files for indel_known_reference_variants. Order should correspond to that in indel_known_reference_variants.
prefix (String, required): Prefix to use for output files.
ref_map_file (File, required): Reference map file indicating reference sequence and auxillary file locations
snp_is_calibration (Array[Boolean], required): Array of booleans indicating which files in snp_known_reference_variants should be used as calibration sets. True ->calibration set. False -> NOT a calibration set.
snp_is_training (Array[Boolean], required): Array of booleans indicating which files in snp_known_reference_variants should be used as training sets. True -> training set. False -> NOT a training set.
snp_known_reference_variants (Array[File], required): Array of VCF files to use as input reference variants for SNPs. Each can be designated as either calibration or training using snp_is_training and snp_is_calibration.
snp_known_reference_variants_identifier (Array[File], required): Array of names to give to the VCF files given in snp_known_reference_variants. Order should correspond to that in snp_known_reference_variants.
snp_known_reference_variants_index (Array[File], required): Array of VCF index files for snp_known_reference_variants. Order should correspond to that in snp_known_reference_variants.
SplitContigToIntervals.prefix (String, required): Prefix to use for output files.
SplitContigToIntervals.ref_fasta (File, required)
SplitContigToIntervals.ref_fasta_fai (File, required)

Optional

annotation_bed_file_annotation_names (Array[String]?): Array of names/FILTER column entries to use for each given file in annotation_bed_files. Order should correspond to annotation_bed_files.
annotation_bed_file_indexes (Array[File]?): Array of bed indexes for annotation_bed_files. Order should correspond to annotation_bed_files.
annotation_bed_files (Array[File]?): Array of bed files to use to FILTER/annotate variants in the output file. Annotations will be placed in the FILTER column, effectively filtering variants that overlap these regions.
AnnotateVcfRegions.runtime_attr_override (RuntimeAttr?)
CreateSampleNameMap.runtime_attr_override (RuntimeAttr?)
ExtractIndelVariantAnnotations.runtime_attr_override (RuntimeAttr?)
ExtractSnpVariantAnnotations.runtime_attr_override (RuntimeAttr?)
FinalizeVETSTBI.name (String?)
FinalizeVETSTBI.runtime_attr_override (RuntimeAttr?)
FinalizeVETSVCF.name (String?)
FinalizeVETSVCF.runtime_attr_override (RuntimeAttr?)
GatherRescoredVcfs.runtime_attr_override (RuntimeAttr?)
GatherSitesOnlyVCFs.runtime_attr_override (RuntimeAttr?)
GatherVcfs.runtime_attr_override (RuntimeAttr?)
GenomicsDbImport.runtime_attr_override (RuntimeAttr?)
GenotypeGVCFs.input_gvcf_index (File?)
GenotypeGVCFs.runtime_attr_override (RuntimeAttr?)
MakeChrIntervalList.runtime_attr_override (RuntimeAttr?)
MakeSitesOnlyVCF.runtime_attr_override (RuntimeAttr?)
ScoreIndelVariantAnnotations.runtime_attr_override (RuntimeAttr?)
ScoreSnpVariantAnnotations.runtime_attr_override (RuntimeAttr?)
SplitContigToIntervals.runtime_attr_override (RuntimeAttr?)
TrainIndelVariantAnnotationsModel.runtime_attr_override (RuntimeAttr?)
TrainIndelVariantAnnotationsModel.unlabeled_annotation_hdf5 (File?)
TrainSnpVariantAnnotationsModel.runtime_attr_override (RuntimeAttr?)
TrainSnpVariantAnnotationsModel.unlabeled_annotation_hdf5 (File?)

Defaults

indel_calibration_sensitivity (Float, default=0.99): VETS (ScoreVariantAnnotations) parameter - score below which INDEL variants will be filtered.
indel_max_unlabeled_variants (Int, default=0): VETS (ExtractVariantAnnotations) parameter - maximum number of unlabeled INDEL variants/alleles to randomly sample with reservoir sampling. If nonzero, annotations will also be extracted from unlabeled sites.
indel_recalibration_annotation_values (Array[String], default=["BaseQRankSum", "ExcessHet", "FS", "HAPCOMP", "HAPDOM", "HEC", "MQ", "MQRankSum", "QD", "ReadPosRankSum", "SOR", "DP"]): VETS (ScoreSnpVariantAnnotations/ScoreVariantAnnotations) parameter - Array of annotation names to use to create the INDEL variant scoring model and over which to score INDEL variants.
snp_calibration_sensitivity (Float, default=0.99): VETS (ScoreVariantAnnotations) parameter - score below which SNP variants will be filtered.
snp_max_unlabeled_variants (Int, default=0): VETS (ExtractVariantAnnotations) parameter - maximum number of unlabeled SNP variants/alleles to randomly sample with reservoir sampling. If nonzero, annotations will also be extracted from unlabeled sites.
snp_recalibration_annotation_values (Array[String], default=["BaseQRankSum", "ExcessHet", "FS", "HAPCOMP", "HAPDOM", "HEC", "MQ", "MQRankSum", "QD", "ReadPosRankSum", "SOR", "DP"]): VETS (ScoreSnpVariantAnnotations/ScoreVariantAnnotations) parameter - Array of annotation names to use to create the SNP variant scoring model and over which to score SNP variants.
CreateSampleNameMap.background_sample_gvcfs (Array[File], default=[])
GenomicsDbImport.batch_size (Int, default=100)
GenotypeGVCFs.batch_size (Int, default=100)
MakeChrIntervalList.filter (Array[String], default=['random', 'chrUn', 'decoy', 'alt', 'HLA', 'EBV'])
SplitContigToIntervals.size (Int, default=200000)
TrainIndelVariantAnnotationsModel.calibration_sensitivity_threshold (Float, default=0.95)
TrainSnpVariantAnnotationsModel.calibration_sensitivity_threshold (Float, default=0.95)

Outputs

joint_recalibrated_vcf (File)
joint_recalibrated_vcf_tbi (File)

Dot Diagram

SRJointCallGVCFsWithGenomicsDB_Pf_Niare_VETS