SRWholeGenome_Pf_Niare_VETS

author: Jonn Smith
description: This workflow implements a modified version of the single-sample pipeline from Niare et al. (https://doi.org/10.1186/s12936-023-04632-0) using LRMA conventions. The modification is that this pipeline uses VETS instead of VQSR.

Inputs

Required

aligned_bais (Array[File], required): Array of aligned bam indices to process. Order must correspond to aligned_bams.
aligned_bams (Array[File], required): Array of aligned bam files to process.
gcs_out_root_dir (String, required): GCS Bucket into which to finalize outputs.
genotype_gvcfs_intervals (File, required): Intervals over which to batch Joint Genotyping.
indel_is_calibration (Array[Boolean], required): Array of booleans indicating which files in indel_known_reference_variants should be used as calibration sets. True ->calibration set. False -> NOT a calibration set.
indel_is_training (Array[Boolean], required): Array of booleans indicating which files in indel_known_reference_variants should be used as training sets. True -> training set. False -> NOT a training set.
indel_known_reference_variants (Array[File], required): Array of VCF files to use as input reference variants for INDELs. Each can be designated as either calibration or training using indel_is_training and indel_is_calibration.
indel_known_reference_variants_identifier (Array[File], required): Array of names to give to the VCF files given in indel_known_reference_variants. Order should correspond to that in indel_known_reference_variants.
indel_known_reference_variants_index (Array[File], required): Array of VCF index files for indel_known_reference_variants. Order should correspond to that in indel_known_reference_variants.
participant_name (String, required): The unique identifier of this sample being processed.
ref_map_file (File, required): Reference map file indicating reference sequence and auxillary file locations
snp_is_calibration (Array[Boolean], required): Array of booleans indicating which files in snp_known_reference_variants should be used as calibration sets. True ->calibration set. False -> NOT a calibration set.
snp_is_training (Array[Boolean], required): Array of booleans indicating which files in snp_known_reference_variants should be used as training sets. True -> training set. False -> NOT a training set.
snp_known_reference_variants (Array[File], required): Array of VCF files to use as input reference variants for SNPs. Each can be designated as either calibration or training using snp_is_training and snp_is_calibration.
snp_known_reference_variants_identifier (Array[File], required): Array of names to give to the VCF files given in snp_known_reference_variants. Order should correspond to that in snp_known_reference_variants.
snp_known_reference_variants_index (Array[File], required): Array of VCF index files for snp_known_reference_variants. Order should correspond to that in snp_known_reference_variants.
vcf_calling_interval_list (File, required): Intervals over which to call variants.

Optional

bed_to_compute_coverage (File?): Bed file to use as regions over which to measure coverage.
ExtractIndelVariantAnnotations.runtime_attr_override (RuntimeAttr?)
ExtractSnpVariantAnnotations.runtime_attr_override (RuntimeAttr?)
FinalizeHCBaiOut.name (String?)
FinalizeHCBaiOut.runtime_attr_override (RuntimeAttr?)
FinalizeHCBamOut.name (String?)
FinalizeHCBamOut.runtime_attr_override (RuntimeAttr?)
FinalizeHCGTbi.name (String?)
FinalizeHCGTbi.runtime_attr_override (RuntimeAttr?)
FinalizeHCGVcf.name (String?)
FinalizeHCGVcf.runtime_attr_override (RuntimeAttr?)
FinalizeHCRescoredTbi.name (String?)
FinalizeHCRescoredTbi.runtime_attr_override (RuntimeAttr?)
FinalizeHCRescoredVcf.name (String?)
FinalizeHCRescoredVcf.runtime_attr_override (RuntimeAttr?)
FinalizeRawHCTbi.name (String?)
FinalizeRawHCTbi.runtime_attr_override (RuntimeAttr?)
FinalizeRawHCVcf.name (String?)
FinalizeRawHCVcf.runtime_attr_override (RuntimeAttr?)
MergeAllReads.runtime_attr_override (RuntimeAttr?)
RemoveFilteredVariants.runtime_attr_override (RuntimeAttr?)
RenameRawHcGvcf.runtime_attr_override (RuntimeAttr?)
RenameRawHcVcf.runtime_attr_override (RuntimeAttr?)
ScoreIndelVariantAnnotations.runtime_attr_override (RuntimeAttr?)
ScoreSnpVariantAnnotations.runtime_attr_override (RuntimeAttr?)
TrainIndelVariantAnnotationsModel.runtime_attr_override (RuntimeAttr?)
TrainIndelVariantAnnotationsModel.unlabeled_annotation_hdf5 (File?)
TrainSnpVariantAnnotationsModel.runtime_attr_override (RuntimeAttr?)
TrainSnpVariantAnnotationsModel.unlabeled_annotation_hdf5 (File?)
CallVariantsWithHaplotypeCaller.CallVariantsWithHC.runtime_attr_override (RuntimeAttr?)
CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.dbsnp_vcf (String?)
CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.runtime_attr_override (RuntimeAttr?)
CallVariantsWithHaplotypeCaller.IndexBamout.runtime_attr_override (RuntimeAttr?)
CallVariantsWithHaplotypeCaller.MergeGVCFs.runtime_attr_override (RuntimeAttr?)
CallVariantsWithHaplotypeCaller.MergeVariantCalledBamOuts.runtime_attr_override (RuntimeAttr?)
CallVariantsWithHaplotypeCaller.SmallVariantsScatterPrep.runtime_attr_override (RuntimeAttr?)

Defaults

contigs_names_to_ignore (Array[String], default=["RANDOM_PLACEHOLDER_VALUE"]): Array of names of contigs to ignore for the purposes of reporting variants.
indel_calibration_sensitivity (Float, default=0.99): VETS (ScoreVariantAnnotations) parameter - score below which INDEL variants will be filtered.
indel_max_unlabeled_variants (Int, default=0): VETS (ExtractVariantAnnotations) parameter - maximum number of unlabeled INDEL variants/alleles to randomly sample with reservoir sampling. If nonzero, annotations will also be extracted from unlabeled sites.
indel_recalibration_annotation_values (Array[String], default=["BaseQRankSum", "ExcessHet", "FS", "HAPCOMP", "HAPDOM", "HEC", "MQ", "MQRankSum", "QD", "ReadPosRankSum", "SOR", "DP"]): VETS (ScoreSnpVariantAnnotations/ScoreVariantAnnotations) parameter - Array of annotation names to use to create the INDEL variant scoring model and over which to score INDEL variants.
snp_calibration_sensitivity (Float, default=0.99): VETS (ScoreVariantAnnotations) parameter - score below which SNP variants will be filtered.
snp_max_unlabeled_variants (Int, default=0): VETS (ExtractVariantAnnotations) parameter - maximum number of unlabeled SNP variants/alleles to randomly sample with reservoir sampling. If nonzero, annotations will also be extracted from unlabeled sites.
snp_recalibration_annotation_values (Array[String], default=["BaseQRankSum", "ExcessHet", "FS", "HAPCOMP", "HAPDOM", "HEC", "MQ", "MQRankSum", "QD", "ReadPosRankSum", "SOR", "DP"]): VETS (ScoreSnpVariantAnnotations/ScoreVariantAnnotations) parameter - Array of annotation names to use to create the SNP variant scoring model and over which to score SNP variants.
CallVariantsWithHaplotypeCaller.call_vars_on_mitochondria (Boolean, default=false)
RenameRawHcVcf.is_gvcf (Boolean, default=false)
TrainIndelVariantAnnotationsModel.calibration_sensitivity_threshold (Float, default=0.95)
TrainSnpVariantAnnotationsModel.calibration_sensitivity_threshold (Float, default=0.95)
CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.heterozygosity (Float, default=0.001)
CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.heterozygosity_stdev (Float, default=0.01)
CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.indel_heterozygosity (Float, default=0.000125)
CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.keep_combined_raw_annotations (Boolean, default=false)
CallVariantsWithHaplotypeCaller.MergeGVCFs.is_gvcf (Boolean, default=false)

Outputs

successfully_processed (Boolean)
hc_g_vcf (File?)
hc_g_tbi (File?)
hc_bamout (File?)
hc_baiout (File?)
hc_raw_vcf (File?)
hc_raw_tbi (File?)
hc_rescored_vcf (File?)
hc_rescored_tbi (File?)

Dot Diagram

SRWholeGenome_Pf_Niare_VETS