SRJointCallGVCFsWithGenomicsDB_Pf_Niare_VETS
SRJointCallGVCFsWithGenomicsDB_Pf_Niare_VQSR
- author
- Jonn Smith
- description
- This workflow implements a modified version of the joint calling pipeline from Niare et al. (https://doi.org/10.1186/s12936-023-04632-0) using LRMA conventions. The modification is that this pipeline uses VETS instead of VQSR.
Inputs
Required
gcs_out_root_dir(String, required): GCS Bucket into which to finalize outputs.gvcf_indices(Array[File], required): Array of gvcf index files forgvcfs. Order should correspond to that ingvcfs.gvcfs(Array[File], required): Array of GVCF files to use as inputs for joint calling.indel_is_calibration(Array[Boolean], required): Array of booleans indicating which files inindel_known_reference_variantsshould be used as calibration sets. True ->calibration set. False -> NOT a calibration set.indel_is_training(Array[Boolean], required): Array of booleans indicating which files inindel_known_reference_variantsshould be used as training sets. True -> training set. False -> NOT a training set.indel_known_reference_variants(Array[File], required): Array of VCF files to use as input reference variants for INDELs. Each can be designated as either calibration or training usingindel_is_trainingandindel_is_calibration.indel_known_reference_variants_identifier(Array[File], required): Array of names to give to the VCF files given inindel_known_reference_variants. Order should correspond to that inindel_known_reference_variants.indel_known_reference_variants_index(Array[File], required): Array of VCF index files forindel_known_reference_variants. Order should correspond to that inindel_known_reference_variants.prefix(String, required): Prefix to use for output files.ref_map_file(File, required): Reference map file indicating reference sequence and auxillary file locationssnp_is_calibration(Array[Boolean], required): Array of booleans indicating which files insnp_known_reference_variantsshould be used as calibration sets. True ->calibration set. False -> NOT a calibration set.snp_is_training(Array[Boolean], required): Array of booleans indicating which files insnp_known_reference_variantsshould be used as training sets. True -> training set. False -> NOT a training set.snp_known_reference_variants(Array[File], required): Array of VCF files to use as input reference variants for SNPs. Each can be designated as either calibration or training usingsnp_is_trainingandsnp_is_calibration.snp_known_reference_variants_identifier(Array[File], required): Array of names to give to the VCF files given insnp_known_reference_variants. Order should correspond to that insnp_known_reference_variants.snp_known_reference_variants_index(Array[File], required): Array of VCF index files forsnp_known_reference_variants. Order should correspond to that insnp_known_reference_variants.SplitContigToIntervals.prefix(String, required): Prefix to use for output files.SplitContigToIntervals.ref_fasta(File, required)SplitContigToIntervals.ref_fasta_fai(File, required)
Optional
annotation_bed_file_annotation_names(Array[String]?): Array of names/FILTER column entries to use for each given file inannotation_bed_files. Order should correspond toannotation_bed_files.annotation_bed_file_indexes(Array[File]?): Array of bed indexes forannotation_bed_files. Order should correspond toannotation_bed_files.annotation_bed_files(Array[File]?): Array of bed files to use to FILTER/annotate variants in the output file. Annotations will be placed in the FILTER column, effectively filtering variants that overlap these regions.AnnotateVcfRegions.runtime_attr_override(RuntimeAttr?)CreateSampleNameMap.runtime_attr_override(RuntimeAttr?)ExtractIndelVariantAnnotations.runtime_attr_override(RuntimeAttr?)ExtractSnpVariantAnnotations.runtime_attr_override(RuntimeAttr?)FinalizeVETSTBI.name(String?)FinalizeVETSTBI.runtime_attr_override(RuntimeAttr?)FinalizeVETSVCF.name(String?)FinalizeVETSVCF.runtime_attr_override(RuntimeAttr?)GatherRescoredVcfs.runtime_attr_override(RuntimeAttr?)GatherSitesOnlyVCFs.runtime_attr_override(RuntimeAttr?)GatherVcfs.runtime_attr_override(RuntimeAttr?)GenomicsDbImport.runtime_attr_override(RuntimeAttr?)GenotypeGVCFs.input_gvcf_index(File?)GenotypeGVCFs.runtime_attr_override(RuntimeAttr?)MakeChrIntervalList.runtime_attr_override(RuntimeAttr?)MakeSitesOnlyVCF.runtime_attr_override(RuntimeAttr?)ScoreIndelVariantAnnotations.runtime_attr_override(RuntimeAttr?)ScoreSnpVariantAnnotations.runtime_attr_override(RuntimeAttr?)SplitContigToIntervals.runtime_attr_override(RuntimeAttr?)TrainIndelVariantAnnotationsModel.runtime_attr_override(RuntimeAttr?)TrainIndelVariantAnnotationsModel.unlabeled_annotation_hdf5(File?)TrainSnpVariantAnnotationsModel.runtime_attr_override(RuntimeAttr?)TrainSnpVariantAnnotationsModel.unlabeled_annotation_hdf5(File?)
Defaults
indel_calibration_sensitivity(Float, default=0.99): VETS (ScoreVariantAnnotations) parameter - score below which INDEL variants will be filtered.indel_max_unlabeled_variants(Int, default=0): VETS (ExtractVariantAnnotations) parameter - maximum number of unlabeled INDEL variants/alleles to randomly sample with reservoir sampling. If nonzero, annotations will also be extracted from unlabeled sites.indel_recalibration_annotation_values(Array[String], default=["BaseQRankSum", "ExcessHet", "FS", "HAPCOMP", "HAPDOM", "HEC", "MQ", "MQRankSum", "QD", "ReadPosRankSum", "SOR", "DP"]): VETS (ScoreSnpVariantAnnotations/ScoreVariantAnnotations) parameter - Array of annotation names to use to create the INDEL variant scoring model and over which to score INDEL variants.snp_calibration_sensitivity(Float, default=0.99): VETS (ScoreVariantAnnotations) parameter - score below which SNP variants will be filtered.snp_max_unlabeled_variants(Int, default=0): VETS (ExtractVariantAnnotations) parameter - maximum number of unlabeled SNP variants/alleles to randomly sample with reservoir sampling. If nonzero, annotations will also be extracted from unlabeled sites.snp_recalibration_annotation_values(Array[String], default=["BaseQRankSum", "ExcessHet", "FS", "HAPCOMP", "HAPDOM", "HEC", "MQ", "MQRankSum", "QD", "ReadPosRankSum", "SOR", "DP"]): VETS (ScoreSnpVariantAnnotations/ScoreVariantAnnotations) parameter - Array of annotation names to use to create the SNP variant scoring model and over which to score SNP variants.CreateSampleNameMap.background_sample_gvcfs(Array[File], default=[])GenomicsDbImport.batch_size(Int, default=100)GenotypeGVCFs.batch_size(Int, default=100)MakeChrIntervalList.filter(Array[String], default=['random', 'chrUn', 'decoy', 'alt', 'HLA', 'EBV'])SplitContigToIntervals.size(Int, default=200000)TrainIndelVariantAnnotationsModel.calibration_sensitivity_threshold(Float, default=0.95)TrainSnpVariantAnnotationsModel.calibration_sensitivity_threshold(Float, default=0.95)
Outputs
joint_recalibrated_vcf(File)joint_recalibrated_vcf_tbi(File)
Dot Diagram
