SRWholeGenome_Pf_Niare_VETS
SRWholeGenome_Pf_Niare_VETS
- author
- Jonn Smith
- description
- This workflow implements a modified version of the single-sample pipeline from Niare et al. (https://doi.org/10.1186/s12936-023-04632-0) using LRMA conventions. The modification is that this pipeline uses VETS instead of VQSR.
Inputs
Required
aligned_bais
(Array[File], required): Array of aligned bam indices to process. Order must correspond toaligned_bams
.aligned_bams
(Array[File], required): Array of aligned bam files to process.gcs_out_root_dir
(String, required): GCS Bucket into which to finalize outputs.genotype_gvcfs_intervals
(File, required): Intervals over which to batch Joint Genotyping.indel_is_calibration
(Array[Boolean], required): Array of booleans indicating which files inindel_known_reference_variants
should be used as calibration sets. True ->calibration set. False -> NOT a calibration set.indel_is_training
(Array[Boolean], required): Array of booleans indicating which files inindel_known_reference_variants
should be used as training sets. True -> training set. False -> NOT a training set.indel_known_reference_variants
(Array[File], required): Array of VCF files to use as input reference variants for INDELs. Each can be designated as either calibration or training usingindel_is_training
andindel_is_calibration
.indel_known_reference_variants_identifier
(Array[File], required): Array of names to give to the VCF files given inindel_known_reference_variants
. Order should correspond to that inindel_known_reference_variants
.indel_known_reference_variants_index
(Array[File], required): Array of VCF index files forindel_known_reference_variants
. Order should correspond to that inindel_known_reference_variants
.participant_name
(String, required): The unique identifier of this sample being processed.ref_map_file
(File, required): Reference map file indicating reference sequence and auxillary file locationssnp_is_calibration
(Array[Boolean], required): Array of booleans indicating which files insnp_known_reference_variants
should be used as calibration sets. True ->calibration set. False -> NOT a calibration set.snp_is_training
(Array[Boolean], required): Array of booleans indicating which files insnp_known_reference_variants
should be used as training sets. True -> training set. False -> NOT a training set.snp_known_reference_variants
(Array[File], required): Array of VCF files to use as input reference variants for SNPs. Each can be designated as either calibration or training usingsnp_is_training
andsnp_is_calibration
.snp_known_reference_variants_identifier
(Array[File], required): Array of names to give to the VCF files given insnp_known_reference_variants
. Order should correspond to that insnp_known_reference_variants
.snp_known_reference_variants_index
(Array[File], required): Array of VCF index files forsnp_known_reference_variants
. Order should correspond to that insnp_known_reference_variants
.vcf_calling_interval_list
(File, required): Intervals over which to call variants.
Optional
bed_to_compute_coverage
(File?): Bed file to use as regions over which to measure coverage.ExtractIndelVariantAnnotations.runtime_attr_override
(RuntimeAttr?)ExtractSnpVariantAnnotations.runtime_attr_override
(RuntimeAttr?)FinalizeHCBaiOut.name
(String?)FinalizeHCBaiOut.runtime_attr_override
(RuntimeAttr?)FinalizeHCBamOut.name
(String?)FinalizeHCBamOut.runtime_attr_override
(RuntimeAttr?)FinalizeHCGTbi.name
(String?)FinalizeHCGTbi.runtime_attr_override
(RuntimeAttr?)FinalizeHCGVcf.name
(String?)FinalizeHCGVcf.runtime_attr_override
(RuntimeAttr?)FinalizeHCRescoredTbi.name
(String?)FinalizeHCRescoredTbi.runtime_attr_override
(RuntimeAttr?)FinalizeHCRescoredVcf.name
(String?)FinalizeHCRescoredVcf.runtime_attr_override
(RuntimeAttr?)FinalizeRawHCTbi.name
(String?)FinalizeRawHCTbi.runtime_attr_override
(RuntimeAttr?)FinalizeRawHCVcf.name
(String?)FinalizeRawHCVcf.runtime_attr_override
(RuntimeAttr?)MergeAllReads.runtime_attr_override
(RuntimeAttr?)RemoveFilteredVariants.runtime_attr_override
(RuntimeAttr?)RenameRawHcGvcf.runtime_attr_override
(RuntimeAttr?)RenameRawHcVcf.runtime_attr_override
(RuntimeAttr?)ScoreIndelVariantAnnotations.runtime_attr_override
(RuntimeAttr?)ScoreSnpVariantAnnotations.runtime_attr_override
(RuntimeAttr?)TrainIndelVariantAnnotationsModel.runtime_attr_override
(RuntimeAttr?)TrainIndelVariantAnnotationsModel.unlabeled_annotation_hdf5
(File?)TrainSnpVariantAnnotationsModel.runtime_attr_override
(RuntimeAttr?)TrainSnpVariantAnnotationsModel.unlabeled_annotation_hdf5
(File?)CallVariantsWithHaplotypeCaller.CallVariantsWithHC.runtime_attr_override
(RuntimeAttr?)CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.dbsnp_vcf
(String?)CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.runtime_attr_override
(RuntimeAttr?)CallVariantsWithHaplotypeCaller.IndexBamout.runtime_attr_override
(RuntimeAttr?)CallVariantsWithHaplotypeCaller.MergeGVCFs.runtime_attr_override
(RuntimeAttr?)CallVariantsWithHaplotypeCaller.MergeVariantCalledBamOuts.runtime_attr_override
(RuntimeAttr?)CallVariantsWithHaplotypeCaller.SmallVariantsScatterPrep.runtime_attr_override
(RuntimeAttr?)
Defaults
contigs_names_to_ignore
(Array[String], default=["RANDOM_PLACEHOLDER_VALUE"]): Array of names of contigs to ignore for the purposes of reporting variants.indel_calibration_sensitivity
(Float, default=0.99): VETS (ScoreVariantAnnotations) parameter - score below which INDEL variants will be filtered.indel_max_unlabeled_variants
(Int, default=0): VETS (ExtractVariantAnnotations) parameter - maximum number of unlabeled INDEL variants/alleles to randomly sample with reservoir sampling. If nonzero, annotations will also be extracted from unlabeled sites.indel_recalibration_annotation_values
(Array[String], default=["BaseQRankSum", "ExcessHet", "FS", "HAPCOMP", "HAPDOM", "HEC", "MQ", "MQRankSum", "QD", "ReadPosRankSum", "SOR", "DP"]): VETS (ScoreSnpVariantAnnotations/ScoreVariantAnnotations) parameter - Array of annotation names to use to create the INDEL variant scoring model and over which to score INDEL variants.snp_calibration_sensitivity
(Float, default=0.99): VETS (ScoreVariantAnnotations) parameter - score below which SNP variants will be filtered.snp_max_unlabeled_variants
(Int, default=0): VETS (ExtractVariantAnnotations) parameter - maximum number of unlabeled SNP variants/alleles to randomly sample with reservoir sampling. If nonzero, annotations will also be extracted from unlabeled sites.snp_recalibration_annotation_values
(Array[String], default=["BaseQRankSum", "ExcessHet", "FS", "HAPCOMP", "HAPDOM", "HEC", "MQ", "MQRankSum", "QD", "ReadPosRankSum", "SOR", "DP"]): VETS (ScoreSnpVariantAnnotations/ScoreVariantAnnotations) parameter - Array of annotation names to use to create the SNP variant scoring model and over which to score SNP variants.CallVariantsWithHaplotypeCaller.call_vars_on_mitochondria
(Boolean, default=false)RenameRawHcVcf.is_gvcf
(Boolean, default=false)TrainIndelVariantAnnotationsModel.calibration_sensitivity_threshold
(Float, default=0.95)TrainSnpVariantAnnotationsModel.calibration_sensitivity_threshold
(Float, default=0.95)CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.heterozygosity
(Float, default=0.001)CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.heterozygosity_stdev
(Float, default=0.01)CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.indel_heterozygosity
(Float, default=0.000125)CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.keep_combined_raw_annotations
(Boolean, default=false)CallVariantsWithHaplotypeCaller.MergeGVCFs.is_gvcf
(Boolean, default=false)
Outputs
successfully_processed
(Boolean)hc_g_vcf
(File?)hc_g_tbi
(File?)hc_bamout
(File?)hc_baiout
(File?)hc_raw_vcf
(File?)hc_raw_tbi
(File?)hc_rescored_vcf
(File?)hc_rescored_tbi
(File?)