SRJointCallGVCFsWithGenomicsDB_Pf_Niare_VETS
SRJointCallGVCFsWithGenomicsDB_Pf_Niare_VQSR
- author
- Jonn Smith
- description
- This workflow implements a modified version of the joint calling pipeline from Niare et al. (https://doi.org/10.1186/s12936-023-04632-0) using LRMA conventions. The modification is that this pipeline uses VETS instead of VQSR.
Inputs
Required
gcs_out_root_dir
(String, required): GCS Bucket into which to finalize outputs.gvcf_indices
(Array[File], required): Array of gvcf index files forgvcfs
. Order should correspond to that ingvcfs
.gvcfs
(Array[File], required): Array of GVCF files to use as inputs for joint calling.indel_is_calibration
(Array[Boolean], required): Array of booleans indicating which files inindel_known_reference_variants
should be used as calibration sets. True ->calibration set. False -> NOT a calibration set.indel_is_training
(Array[Boolean], required): Array of booleans indicating which files inindel_known_reference_variants
should be used as training sets. True -> training set. False -> NOT a training set.indel_known_reference_variants
(Array[File], required): Array of VCF files to use as input reference variants for INDELs. Each can be designated as either calibration or training usingindel_is_training
andindel_is_calibration
.indel_known_reference_variants_identifier
(Array[File], required): Array of names to give to the VCF files given inindel_known_reference_variants
. Order should correspond to that inindel_known_reference_variants
.indel_known_reference_variants_index
(Array[File], required): Array of VCF index files forindel_known_reference_variants
. Order should correspond to that inindel_known_reference_variants
.prefix
(String, required): Prefix to use for output files.ref_map_file
(File, required): Reference map file indicating reference sequence and auxillary file locationssnp_is_calibration
(Array[Boolean], required): Array of booleans indicating which files insnp_known_reference_variants
should be used as calibration sets. True ->calibration set. False -> NOT a calibration set.snp_is_training
(Array[Boolean], required): Array of booleans indicating which files insnp_known_reference_variants
should be used as training sets. True -> training set. False -> NOT a training set.snp_known_reference_variants
(Array[File], required): Array of VCF files to use as input reference variants for SNPs. Each can be designated as either calibration or training usingsnp_is_training
andsnp_is_calibration
.snp_known_reference_variants_identifier
(Array[File], required): Array of names to give to the VCF files given insnp_known_reference_variants
. Order should correspond to that insnp_known_reference_variants
.snp_known_reference_variants_index
(Array[File], required): Array of VCF index files forsnp_known_reference_variants
. Order should correspond to that insnp_known_reference_variants
.SplitContigToIntervals.prefix
(String, required): Prefix to use for output files.SplitContigToIntervals.ref_fasta
(File, required)SplitContigToIntervals.ref_fasta_fai
(File, required)
Optional
annotation_bed_file_annotation_names
(Array[String]?): Array of names/FILTER column entries to use for each given file inannotation_bed_files
. Order should correspond toannotation_bed_files
.annotation_bed_file_indexes
(Array[File]?): Array of bed indexes forannotation_bed_files
. Order should correspond toannotation_bed_files
.annotation_bed_files
(Array[File]?): Array of bed files to use to FILTER/annotate variants in the output file. Annotations will be placed in the FILTER column, effectively filtering variants that overlap these regions.AnnotateVcfRegions.runtime_attr_override
(RuntimeAttr?)CreateSampleNameMap.runtime_attr_override
(RuntimeAttr?)ExtractIndelVariantAnnotations.runtime_attr_override
(RuntimeAttr?)ExtractSnpVariantAnnotations.runtime_attr_override
(RuntimeAttr?)FinalizeVETSTBI.name
(String?)FinalizeVETSTBI.runtime_attr_override
(RuntimeAttr?)FinalizeVETSVCF.name
(String?)FinalizeVETSVCF.runtime_attr_override
(RuntimeAttr?)GatherRescoredVcfs.runtime_attr_override
(RuntimeAttr?)GatherSitesOnlyVCFs.runtime_attr_override
(RuntimeAttr?)GatherVcfs.runtime_attr_override
(RuntimeAttr?)GenomicsDbImport.runtime_attr_override
(RuntimeAttr?)GenotypeGVCFs.input_gvcf_index
(File?)GenotypeGVCFs.runtime_attr_override
(RuntimeAttr?)MakeChrIntervalList.runtime_attr_override
(RuntimeAttr?)MakeSitesOnlyVCF.runtime_attr_override
(RuntimeAttr?)ScoreIndelVariantAnnotations.runtime_attr_override
(RuntimeAttr?)ScoreSnpVariantAnnotations.runtime_attr_override
(RuntimeAttr?)SplitContigToIntervals.runtime_attr_override
(RuntimeAttr?)TrainIndelVariantAnnotationsModel.runtime_attr_override
(RuntimeAttr?)TrainIndelVariantAnnotationsModel.unlabeled_annotation_hdf5
(File?)TrainSnpVariantAnnotationsModel.runtime_attr_override
(RuntimeAttr?)TrainSnpVariantAnnotationsModel.unlabeled_annotation_hdf5
(File?)
Defaults
indel_calibration_sensitivity
(Float, default=0.99): VETS (ScoreVariantAnnotations) parameter - score below which INDEL variants will be filtered.indel_max_unlabeled_variants
(Int, default=0): VETS (ExtractVariantAnnotations) parameter - maximum number of unlabeled INDEL variants/alleles to randomly sample with reservoir sampling. If nonzero, annotations will also be extracted from unlabeled sites.indel_recalibration_annotation_values
(Array[String], default=["BaseQRankSum", "ExcessHet", "FS", "HAPCOMP", "HAPDOM", "HEC", "MQ", "MQRankSum", "QD", "ReadPosRankSum", "SOR", "DP"]): VETS (ScoreSnpVariantAnnotations/ScoreVariantAnnotations) parameter - Array of annotation names to use to create the INDEL variant scoring model and over which to score INDEL variants.snp_calibration_sensitivity
(Float, default=0.99): VETS (ScoreVariantAnnotations) parameter - score below which SNP variants will be filtered.snp_max_unlabeled_variants
(Int, default=0): VETS (ExtractVariantAnnotations) parameter - maximum number of unlabeled SNP variants/alleles to randomly sample with reservoir sampling. If nonzero, annotations will also be extracted from unlabeled sites.snp_recalibration_annotation_values
(Array[String], default=["BaseQRankSum", "ExcessHet", "FS", "HAPCOMP", "HAPDOM", "HEC", "MQ", "MQRankSum", "QD", "ReadPosRankSum", "SOR", "DP"]): VETS (ScoreSnpVariantAnnotations/ScoreVariantAnnotations) parameter - Array of annotation names to use to create the SNP variant scoring model and over which to score SNP variants.CreateSampleNameMap.background_sample_gvcfs
(Array[File], default=[])GenomicsDbImport.batch_size
(Int, default=100)GenotypeGVCFs.batch_size
(Int, default=100)MakeChrIntervalList.filter
(Array[String], default=['random', 'chrUn', 'decoy', 'alt', 'HLA', 'EBV'])SplitContigToIntervals.size
(Int, default=200000)TrainIndelVariantAnnotationsModel.calibration_sensitivity_threshold
(Float, default=0.95)TrainSnpVariantAnnotationsModel.calibration_sensitivity_threshold
(Float, default=0.95)
Outputs
joint_recalibrated_vcf
(File)joint_recalibrated_vcf_tbi
(File)