SRWholeGenome_Pf_Niare_VETS
SRWholeGenome_Pf_Niare_VETS
- author
- Jonn Smith
- description
- This workflow implements a modified version of the single-sample pipeline from Niare et al. (https://doi.org/10.1186/s12936-023-04632-0) using LRMA conventions. The modification is that this pipeline uses VETS instead of VQSR.
Inputs
Required
aligned_bais(Array[File], required): Array of aligned bam indices to process. Order must correspond toaligned_bams.aligned_bams(Array[File], required): Array of aligned bam files to process.gcs_out_root_dir(String, required): GCS Bucket into which to finalize outputs.genotype_gvcfs_intervals(File, required): Intervals over which to batch Joint Genotyping.indel_is_calibration(Array[Boolean], required): Array of booleans indicating which files inindel_known_reference_variantsshould be used as calibration sets. True ->calibration set. False -> NOT a calibration set.indel_is_training(Array[Boolean], required): Array of booleans indicating which files inindel_known_reference_variantsshould be used as training sets. True -> training set. False -> NOT a training set.indel_known_reference_variants(Array[File], required): Array of VCF files to use as input reference variants for INDELs. Each can be designated as either calibration or training usingindel_is_trainingandindel_is_calibration.indel_known_reference_variants_identifier(Array[File], required): Array of names to give to the VCF files given inindel_known_reference_variants. Order should correspond to that inindel_known_reference_variants.indel_known_reference_variants_index(Array[File], required): Array of VCF index files forindel_known_reference_variants. Order should correspond to that inindel_known_reference_variants.participant_name(String, required): The unique identifier of this sample being processed.ref_map_file(File, required): Reference map file indicating reference sequence and auxillary file locationssnp_is_calibration(Array[Boolean], required): Array of booleans indicating which files insnp_known_reference_variantsshould be used as calibration sets. True ->calibration set. False -> NOT a calibration set.snp_is_training(Array[Boolean], required): Array of booleans indicating which files insnp_known_reference_variantsshould be used as training sets. True -> training set. False -> NOT a training set.snp_known_reference_variants(Array[File], required): Array of VCF files to use as input reference variants for SNPs. Each can be designated as either calibration or training usingsnp_is_trainingandsnp_is_calibration.snp_known_reference_variants_identifier(Array[File], required): Array of names to give to the VCF files given insnp_known_reference_variants. Order should correspond to that insnp_known_reference_variants.snp_known_reference_variants_index(Array[File], required): Array of VCF index files forsnp_known_reference_variants. Order should correspond to that insnp_known_reference_variants.vcf_calling_interval_list(File, required): Intervals over which to call variants.
Optional
bed_to_compute_coverage(File?): Bed file to use as regions over which to measure coverage.ExtractIndelVariantAnnotations.runtime_attr_override(RuntimeAttr?)ExtractSnpVariantAnnotations.runtime_attr_override(RuntimeAttr?)FinalizeHCBaiOut.name(String?)FinalizeHCBaiOut.runtime_attr_override(RuntimeAttr?)FinalizeHCBamOut.name(String?)FinalizeHCBamOut.runtime_attr_override(RuntimeAttr?)FinalizeHCGTbi.name(String?)FinalizeHCGTbi.runtime_attr_override(RuntimeAttr?)FinalizeHCGVcf.name(String?)FinalizeHCGVcf.runtime_attr_override(RuntimeAttr?)FinalizeHCRescoredTbi.name(String?)FinalizeHCRescoredTbi.runtime_attr_override(RuntimeAttr?)FinalizeHCRescoredVcf.name(String?)FinalizeHCRescoredVcf.runtime_attr_override(RuntimeAttr?)FinalizeRawHCTbi.name(String?)FinalizeRawHCTbi.runtime_attr_override(RuntimeAttr?)FinalizeRawHCVcf.name(String?)FinalizeRawHCVcf.runtime_attr_override(RuntimeAttr?)MergeAllReads.runtime_attr_override(RuntimeAttr?)RemoveFilteredVariants.runtime_attr_override(RuntimeAttr?)RenameRawHcGvcf.runtime_attr_override(RuntimeAttr?)RenameRawHcVcf.runtime_attr_override(RuntimeAttr?)ScoreIndelVariantAnnotations.runtime_attr_override(RuntimeAttr?)ScoreSnpVariantAnnotations.runtime_attr_override(RuntimeAttr?)TrainIndelVariantAnnotationsModel.runtime_attr_override(RuntimeAttr?)TrainIndelVariantAnnotationsModel.unlabeled_annotation_hdf5(File?)TrainSnpVariantAnnotationsModel.runtime_attr_override(RuntimeAttr?)TrainSnpVariantAnnotationsModel.unlabeled_annotation_hdf5(File?)CallVariantsWithHaplotypeCaller.CallVariantsWithHC.runtime_attr_override(RuntimeAttr?)CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.dbsnp_vcf(String?)CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.runtime_attr_override(RuntimeAttr?)CallVariantsWithHaplotypeCaller.IndexBamout.runtime_attr_override(RuntimeAttr?)CallVariantsWithHaplotypeCaller.MergeGVCFs.runtime_attr_override(RuntimeAttr?)CallVariantsWithHaplotypeCaller.MergeVariantCalledBamOuts.runtime_attr_override(RuntimeAttr?)CallVariantsWithHaplotypeCaller.SmallVariantsScatterPrep.runtime_attr_override(RuntimeAttr?)
Defaults
contigs_names_to_ignore(Array[String], default=["RANDOM_PLACEHOLDER_VALUE"]): Array of names of contigs to ignore for the purposes of reporting variants.indel_calibration_sensitivity(Float, default=0.99): VETS (ScoreVariantAnnotations) parameter - score below which INDEL variants will be filtered.indel_max_unlabeled_variants(Int, default=0): VETS (ExtractVariantAnnotations) parameter - maximum number of unlabeled INDEL variants/alleles to randomly sample with reservoir sampling. If nonzero, annotations will also be extracted from unlabeled sites.indel_recalibration_annotation_values(Array[String], default=["BaseQRankSum", "ExcessHet", "FS", "HAPCOMP", "HAPDOM", "HEC", "MQ", "MQRankSum", "QD", "ReadPosRankSum", "SOR", "DP"]): VETS (ScoreSnpVariantAnnotations/ScoreVariantAnnotations) parameter - Array of annotation names to use to create the INDEL variant scoring model and over which to score INDEL variants.snp_calibration_sensitivity(Float, default=0.99): VETS (ScoreVariantAnnotations) parameter - score below which SNP variants will be filtered.snp_max_unlabeled_variants(Int, default=0): VETS (ExtractVariantAnnotations) parameter - maximum number of unlabeled SNP variants/alleles to randomly sample with reservoir sampling. If nonzero, annotations will also be extracted from unlabeled sites.snp_recalibration_annotation_values(Array[String], default=["BaseQRankSum", "ExcessHet", "FS", "HAPCOMP", "HAPDOM", "HEC", "MQ", "MQRankSum", "QD", "ReadPosRankSum", "SOR", "DP"]): VETS (ScoreSnpVariantAnnotations/ScoreVariantAnnotations) parameter - Array of annotation names to use to create the SNP variant scoring model and over which to score SNP variants.CallVariantsWithHaplotypeCaller.call_vars_on_mitochondria(Boolean, default=false)RenameRawHcVcf.is_gvcf(Boolean, default=false)TrainIndelVariantAnnotationsModel.calibration_sensitivity_threshold(Float, default=0.95)TrainSnpVariantAnnotationsModel.calibration_sensitivity_threshold(Float, default=0.95)CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.heterozygosity(Float, default=0.001)CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.heterozygosity_stdev(Float, default=0.01)CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.indel_heterozygosity(Float, default=0.000125)CallVariantsWithHaplotypeCaller.CollapseGVCFtoVCF.keep_combined_raw_annotations(Boolean, default=false)CallVariantsWithHaplotypeCaller.MergeGVCFs.is_gvcf(Boolean, default=false)
Outputs
successfully_processed(Boolean)hc_g_vcf(File?)hc_g_tbi(File?)hc_bamout(File?)hc_baiout(File?)hc_raw_vcf(File?)hc_raw_tbi(File?)hc_rescored_vcf(File?)hc_rescored_tbi(File?)
Dot Diagram
