SRWholeGenome_Pf_Niare_VETS

SRWholeGenome_Pf_Niare_VETS

author
Jonn Smith
description
This workflow implements a modified version of the single-sample pipeline from Niare et al. (https://doi.org/10.1186/s12936-023-04632-0) using LRMA conventions. The modification is that this pipeline uses VETS instead of VQSR.

Inputs

Required

  • aligned_bais (Array[File], required): Array of aligned bam indices to process. Order must correspond to aligned_bams.
  • aligned_bams (Array[File], required): Array of aligned bam files to process.
  • gcs_out_root_dir (String, required): GCS Bucket into which to finalize outputs.
  • genotype_gvcfs_intervals (File, required): Intervals over which to batch Joint Genotyping.
  • indel_is_calibration (Array[Boolean], required): Array of booleans indicating which files in indel_known_reference_variants should be used as calibration sets. True ->calibration set. False -> NOT a calibration set.
  • indel_is_training (Array[Boolean], required): Array of booleans indicating which files in indel_known_reference_variants should be used as training sets. True -> training set. False -> NOT a training set.
  • indel_known_reference_variants (Array[File], required): Array of VCF files to use as input reference variants for INDELs. Each can be designated as either calibration or training using indel_is_training and indel_is_calibration.
  • indel_known_reference_variants_identifier (Array[File], required): Array of names to give to the VCF files given in indel_known_reference_variants. Order should correspond to that in indel_known_reference_variants.
  • indel_known_reference_variants_index (Array[File], required): Array of VCF index files for indel_known_reference_variants. Order should correspond to that in indel_known_reference_variants.
  • participant_name (String, required): The unique identifier of this sample being processed.
  • ref_map_file (File, required): Reference map file indicating reference sequence and auxillary file locations
  • snp_is_calibration (Array[Boolean], required): Array of booleans indicating which files in snp_known_reference_variants should be used as calibration sets. True ->calibration set. False -> NOT a calibration set.
  • snp_is_training (Array[Boolean], required): Array of booleans indicating which files in snp_known_reference_variants should be used as training sets. True -> training set. False -> NOT a training set.
  • snp_known_reference_variants (Array[File], required): Array of VCF files to use as input reference variants for SNPs. Each can be designated as either calibration or training using snp_is_training and snp_is_calibration.
  • snp_known_reference_variants_identifier (Array[File], required): Array of names to give to the VCF files given in snp_known_reference_variants. Order should correspond to that in snp_known_reference_variants.
  • snp_known_reference_variants_index (Array[File], required): Array of VCF index files for snp_known_reference_variants. Order should correspond to that in snp_known_reference_variants.
  • vcf_calling_interval_list (File, required): Intervals over which to call variants.

Optional

  • bed_to_compute_coverage (File?): Bed file to use as regions over which to measure coverage.
  • ExtractIndelVariantAnnotations.runtime_attr_override (RuntimeAttr?)
  • ExtractSnpVariantAnnotations.runtime_attr_override (RuntimeAttr?)
  • FinalizeHCBaiOut.name (String?)
  • FinalizeHCBaiOut.runtime_attr_override (RuntimeAttr?)
  • FinalizeHCBamOut.name (String?)
  • FinalizeHCBamOut.runtime_attr_override (RuntimeAttr?)
  • FinalizeHCGTbi.name (String?)
  • FinalizeHCGTbi.runtime_attr_override (RuntimeAttr?)
  • FinalizeHCGVcf.name (String?)
  • FinalizeHCGVcf.runtime_attr_override (RuntimeAttr?)
  • FinalizeHCRescoredTbi.name (String?)
  • FinalizeHCRescoredTbi.runtime_attr_override (RuntimeAttr?)
  • FinalizeHCRescoredVcf.name (String?)
  • FinalizeHCRescoredVcf.runtime_attr_override (RuntimeAttr?)
  • FinalizeRawHCTbi.name (String?)
  • FinalizeRawHCTbi.runtime_attr_override (RuntimeAttr?)
  • FinalizeRawHCVcf.name (String?)
  • FinalizeRawHCVcf.runtime_attr_override (RuntimeAttr?)
  • MergeAllReads.runtime_attr_override (RuntimeAttr?)
  • RemoveFilteredVariants.runtime_attr_override (RuntimeAttr?)
  • RenameRawHcGvcf.runtime_attr_override (RuntimeAttr?)
  • RenameRawHcVcf.runtime_attr_override (RuntimeAttr?)
  • ScoreIndelVariantAnnotations.runtime_attr_override (RuntimeAttr?)
  • ScoreSnpVariantAnnotations.runtime_attr_override (RuntimeAttr?)
  • TrainIndelVariantAnnotationsModel.runtime_attr_override (RuntimeAttr?)
  • TrainIndelVariantAnnotationsModel.unlabeled_annotation_hdf5 (File?)
  • TrainSnpVariantAnnotationsModel.runtime_attr_override (RuntimeAttr?)
  • TrainSnpVariantAnnotationsModel.unlabeled_annotation_hdf5 (File?)
  • CallVariantsWithHaplotypeCaller.CallVariantsWithHC.runtime_attr_override (RuntimeAttr?)
  • CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.dbsnp_vcf (String?)
  • CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.runtime_attr_override (RuntimeAttr?)
  • CallVariantsWithHaplotypeCaller.IndexBamout.runtime_attr_override (RuntimeAttr?)
  • CallVariantsWithHaplotypeCaller.MergeGVCFs.runtime_attr_override (RuntimeAttr?)
  • CallVariantsWithHaplotypeCaller.MergeVariantCalledBamOuts.runtime_attr_override (RuntimeAttr?)
  • CallVariantsWithHaplotypeCaller.SmallVariantsScatterPrep.runtime_attr_override (RuntimeAttr?)

Defaults

  • contigs_names_to_ignore (Array[String], default=["RANDOM_PLACEHOLDER_VALUE"]): Array of names of contigs to ignore for the purposes of reporting variants.
  • indel_calibration_sensitivity (Float, default=0.99): VETS (ScoreVariantAnnotations) parameter - score below which INDEL variants will be filtered.
  • indel_max_unlabeled_variants (Int, default=0): VETS (ExtractVariantAnnotations) parameter - maximum number of unlabeled INDEL variants/alleles to randomly sample with reservoir sampling. If nonzero, annotations will also be extracted from unlabeled sites.
  • indel_recalibration_annotation_values (Array[String], default=["BaseQRankSum", "ExcessHet", "FS", "HAPCOMP", "HAPDOM", "HEC", "MQ", "MQRankSum", "QD", "ReadPosRankSum", "SOR", "DP"]): VETS (ScoreSnpVariantAnnotations/ScoreVariantAnnotations) parameter - Array of annotation names to use to create the INDEL variant scoring model and over which to score INDEL variants.
  • snp_calibration_sensitivity (Float, default=0.99): VETS (ScoreVariantAnnotations) parameter - score below which SNP variants will be filtered.
  • snp_max_unlabeled_variants (Int, default=0): VETS (ExtractVariantAnnotations) parameter - maximum number of unlabeled SNP variants/alleles to randomly sample with reservoir sampling. If nonzero, annotations will also be extracted from unlabeled sites.
  • snp_recalibration_annotation_values (Array[String], default=["BaseQRankSum", "ExcessHet", "FS", "HAPCOMP", "HAPDOM", "HEC", "MQ", "MQRankSum", "QD", "ReadPosRankSum", "SOR", "DP"]): VETS (ScoreSnpVariantAnnotations/ScoreVariantAnnotations) parameter - Array of annotation names to use to create the SNP variant scoring model and over which to score SNP variants.
  • CallVariantsWithHaplotypeCaller.call_vars_on_mitochondria (Boolean, default=false)
  • RenameRawHcVcf.is_gvcf (Boolean, default=false)
  • TrainIndelVariantAnnotationsModel.calibration_sensitivity_threshold (Float, default=0.95)
  • TrainSnpVariantAnnotationsModel.calibration_sensitivity_threshold (Float, default=0.95)
  • CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.heterozygosity (Float, default=0.001)
  • CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.heterozygosity_stdev (Float, default=0.01)
  • CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.indel_heterozygosity (Float, default=0.000125)
  • CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.keep_combined_raw_annotations (Boolean, default=false)
  • CallVariantsWithHaplotypeCaller.MergeGVCFs.is_gvcf (Boolean, default=false)

Outputs

  • successfully_processed (Boolean)
  • hc_g_vcf (File?)
  • hc_g_tbi (File?)
  • hc_bamout (File?)
  • hc_baiout (File?)
  • hc_raw_vcf (File?)
  • hc_raw_tbi (File?)
  • hc_rescored_vcf (File?)
  • hc_rescored_tbi (File?)

Dot Diagram

SRWholeGenome_Pf_Niare_VETS