VariantUtils
MergePerChrCalls
- description
- Merge per-chromosome calls into a single VCF
Inputs
Required
prefix(String, required): Prefix for output VCFref_dict(File, required): Reference dictionaryvcfs(Array[File], required): List of per-chromosome VCFs to merge
Optional
runtime_attr_override(RuntimeAttr?)
Outputs
vcf(File)tbi(File)
MergeAndSortVCFs
- description
- Fast merging & sorting VCFs when the default sorting is expected to be slow
Inputs
Required
prefix(String, required)ref_fasta_fai(File, required)vcfs(Array[File], required)
Optional
header_definitions_file(File?): a union of definition header lines for input VCFs (related to https://github.com/samtools/bcftools/issues/1629)runtime_attr_override(RuntimeAttr?)
Outputs
vcf(File)tbi(File)
CollectDefinitions
- description
- Collect (union) various definitions in vcf files, adddressing a bcftols bug: https://github.com/samtools/bcftools/issues/1629
Inputs
Required
vcfs(Array[File], required)
Optional
runtime_attr_override(RuntimeAttr?)
Outputs
union_definitions(File)
GetVCFSampleName
- description
- Currently mostly used for extracting sample name in fingerprinting genotyped VCF
Inputs
Required
fingerprint_vcf(File, required): Assumed to be genotyped, and hold only one sample (other samples will be ignored).
Optional
runtime_attr_override(RuntimeAttr?): Override default runtime attributes
Outputs
sample_name(String)
SubsetVCF
- description
- Subset a VCF file to a given locus
Inputs
Required
locus(String, required): Locus to be subsettedvcf_gz(File, required): VCF file to be subsettedvcf_tbi(File, required): Tabix index for the VCF file
Optional
runtime_attr_override(RuntimeAttr?): Override default runtime attributes
Defaults
prefix(String, default="subset"): Prefix for the output file
Outputs
subset_vcf(File)subset_tbi(File)
ZipAndIndexVCF
- description
- gZip plain text VCF and index it.
Inputs
Required
vcf(File, required): VCF file to be zipped and indexed
Optional
runtime_attr_override(RuntimeAttr?): Override default runtime attributes
Outputs
vcfgz(File)tbi(File)
IndexVCF
- description
- Indexing vcf.gz. Note: do NOT use remote index as that's buggy.
Inputs
Required
vcf(File, required)
Optional
runtime_attr_override(RuntimeAttr?)
Outputs
tbi(File)
FixSnifflesVCF
Inputs
Required
sample_name(String, required): Sniffles infers sample name from the BAM file name, so we fix it herevcf(File, required)
Optional
ref_fasta_fai(File?): provide only when the contig section of the input vcf is suspected to be corruptedruntime_attr_override(RuntimeAttr?)
Outputs
sortedVCF(File)tbi(File)
HardFilterVcf
Inputs
Required
prefix(String, required)vcf(File, required)vcf_index(File, required)
Optional
runtime_attr_override(RuntimeAttr?)
Defaults
excess_het_threshold(Float, default=54.69)
Outputs
variant_filtered_vcf(File)variant_filtered_vcf_index(File)
MakeSitesOnlyVcf
Inputs
Required
prefix(String, required)vcf(File, required)vcf_index(File, required)
Optional
runtime_attr_override(RuntimeAttr?)
Outputs
sites_only_vcf(File)sites_only_vcf_index(File)
AnnotateVcfWithBedRegions
Inputs
Required
bed_file_annotation_names(Array[String], required)bed_file_indexes(Array[File], required)bed_files(Array[File], required)prefix(String, required)vcf(File, required)vcf_index(File, required)
Optional
runtime_attr_override(RuntimeAttr?)
Outputs
annotated_vcf(File)annotated_vcf_index(File)
IndelsVariantRecalibrator
Inputs
Required
is_known(Array[Boolean], required): Array of boolean values indicating if the known_reference_variant file at the same array position contains known variants. Must be the same length asknown_reference_variants.is_training(Array[Boolean], required): Array of boolean values indicating if the known_reference_variant file at the same array position contains training data. Must be the same length asknown_reference_variants.is_truth(Array[Boolean], required): Array of boolean values indicating if the known_reference_variant file at the same array position contains truth data. Must be the same length asknown_reference_variants.known_reference_variants(Array[File], required): Array of known reference VCF files. For humans, dbSNP is one example.known_reference_variants_identifier(Array[String], required): Array of boolean values the identifier / name for the known_reference_variant file at the same array position. Must be the same length asknown_reference_variants.known_reference_variants_index(Array[File], required): Array of index files for known reference VCF files.prefix(String, required)prior(Array[Float], required): Array of integer values indicating the priors for the known_reference_variant file at the same array position. Must be the same length asknown_reference_variants.recalibration_annotation_values(Array[String], required)recalibration_tranche_values(Array[String], required)use_allele_specific_annotations(Boolean, required)vcf_indices(Array[File], required): Tribble Indexes for sites only VCF.vcfs(Array[File], required): Sites only VCFs. Can be pre-filtered using hard-filters.
Optional
runtime_attr_override(RuntimeAttr?)
Defaults
max_gaussians(Int, default=4)
Outputs
recalibration(File)recalibration_index(File)tranches(File)model_report(File)
SNPsVariantRecalibratorCreateModel
Inputs
Required
is_known(Array[Boolean], required): Array of boolean values indicating if the known_reference_variant file at the same array position contains known variants. Must be the same length asknown_reference_variants.is_training(Array[Boolean], required): Array of boolean values indicating if the known_reference_variant file at the same array position contains training data. Must be the same length asknown_reference_variants.is_truth(Array[Boolean], required): Array of boolean values indicating if the known_reference_variant file at the same array position contains truth data. Must be the same length asknown_reference_variants.known_reference_variants(Array[File], required): Array of known reference VCF files. For humans, dbSNP is one example.known_reference_variants_identifier(Array[String], required): Array of boolean values the identifier / name for the known_reference_variant file at the same array position. Must be the same length asknown_reference_variants.known_reference_variants_index(Array[File], required): Array of index files for known reference VCF files.prefix(String, required)prior(Array[Float], required): Array of integer values indicating the priors for the known_reference_variant file at the same array position. Must be the same length asknown_reference_variants.recalibration_annotation_values(Array[String], required)recalibration_tranche_values(Array[String], required)use_allele_specific_annotations(Boolean, required)vcf_indices(Array[File], required): Tribble Indexes for sites only VCF.vcfs(Array[File], required): Sites only VCFs. Can be pre-filtered using hard-filters.
Optional
downsampleFactor(Int?)runtime_attr_override(RuntimeAttr?)
Defaults
max_gaussians(Int, default=6)
Outputs
recalibration(File)recalibration_index(File)tranches(File)model_report(File)
ApplyVqsr
Inputs
Required
indel_filter_level(Float, required)indels_recalibration(File, required)indels_recalibration_index(File, required)indels_tranches(File, required)prefix(String, required)snp_filter_level(Float, required)snps_recalibration(File, required)snps_recalibration_index(File, required)snps_tranches(File, required)use_allele_specific_annotations(Boolean, required)vcf(File, required)vcf_index(File, required)
Optional
runtime_attr_override(RuntimeAttr?)
Outputs
recalibrated_vcf(File)recalibrated_vcf_index(File)
SelectVariants
Inputs
Required
prefix(String, required)vcf(File, required)vcf_index(File, required)
Optional
runtime_attr_override(RuntimeAttr?)
Outputs
vcf_out(File)vcf_out_index(File)
RenameSingleSampleVcf
Inputs
Required
new_sample_name(String, required)prefix(String, required)vcf(File, required)vcf_index(File, required)
Optional
runtime_attr_override(RuntimeAttr?)
Defaults
is_gvcf(Boolean, default=false)
Outputs
new_sample_name_vcf(File)new_sample_name_vcf_index(File)
GatherVcfs
Inputs
Required
input_vcf_indices(Array[File], required)input_vcfs(Array[File], required); localization_optional: trueprefix(String, required)
Optional
runtime_attr_override(RuntimeAttr?)
Outputs
output_vcf(File)output_vcf_index(File)
ExtractFingerprint
Inputs
Required
bai(File, required)bam(File, required)haplotype_database_file(File, required)ref_dict(File, required)ref_fasta(File, required)ref_index(File, required)
Optional
runtime_attr_override(RuntimeAttr?)
Defaults
prefix(String, default="fingerprint")
Outputs
output_vcf(File)fingerprint_string(File)
ExtractFingerprintAndBarcode
Inputs
Required
haplotype_database_file(File, required)ref_dict(File, required)ref_fasta(File, required)ref_fasta_fai(File, required)vcf(File, required)vcf_index(File, required)
Optional
runtime_attr_override(RuntimeAttr?)
Defaults
prefix(String, default="fingerprint")
Outputs
output_vcf(File)barcode(String)barcode_file(File)
ExtractVariantAnnotations
Inputs
Required
is_calibration(Array[Boolean], required): Array of boolean values indicating if the known_reference_variant file at the same array position should be used for 'calibration' data. Must be the same length asknown_reference_variants.is_training(Array[Boolean], required): Array of boolean values indicating if the known_reference_variant file at the same array position should be used for 'training' data. Must be the same length asknown_reference_variants.known_reference_variants(Array[File], required): Array of known reference VCF files. For humans, dbSNP is one example.known_reference_variants_identifier(Array[String], required): Array of boolean values the identifier / name for the known_reference_variant file at the same array position. Must be the same length asknown_reference_variants.known_reference_variants_index(Array[File], required): Array of index files for known reference VCF files.mode(String, required): SNP or INDELprefix(String, required): Prefix of the output files.recalibration_annotation_values(Array[String], required)vcf(File, required): VCF File from which to extract annotations.vcf_index(File, required): Index for the given VCF file.
Optional
runtime_attr_override(RuntimeAttr?)
Defaults
max_unlabeled_variants(Int, default=0): How many sites should be used for unlableled training data. Setting this to values > 0 will enable a positive-negative training model.
Outputs
annotation_hdf5(File)sites_only_vcf(File)sites_only_vcf_index(File)unlabeled_annotation_hdf5(File?)
TrainVariantAnnotationsModel
Inputs
Required
annotation_hdf5(File, required): Labeled-annotations HDF5 file.mode(String, required): SNP or INDELprefix(String, required): Prefix of the output files.
Optional
runtime_attr_override(RuntimeAttr?)unlabeled_annotation_hdf5(File?): Unlabeled-annotations HDF5 file (optional)
Defaults
calibration_sensitivity_threshold(Float, default=0.95): Calibration-set sensitivity threshold. (optional)
Outputs
training_scores(File)positive_model_scorer_pickle(File)unlabeled_positive_model_scores(File?)calibration_set_scores(File?)negative_model_scorer_pickle(File?)
ScoreVariantAnnotations
Inputs
Required
is_calibration(Array[Boolean], required): Array of boolean values indicating if the known_reference_variant file at the same array position should be used for 'calibration' data. Must be the same length asknown_reference_variants.is_training(Array[Boolean], required): Array of boolean values indicating if the known_reference_variant file at the same array position should be used for 'training' data. Must be the same length asknown_reference_variants.known_reference_variants(Array[File], required): Array of known reference VCF files. For humans, dbSNP is one example.known_reference_variants_identifier(Array[String], required): Array of boolean values the identifier / name for the known_reference_variant file at the same array position. Must be the same length asknown_reference_variants.known_reference_variants_index(Array[File], required): Array of index files for known reference VCF files.mode(String, required): SNP or INDELmodel_files(Array[File], required)model_prefix(String, required)prefix(String, required): Prefix of the output files.recalibration_annotation_values(Array[String], required)sites_only_extracted_vcf(File, required)sites_only_extracted_vcf_index(File, required)vcf(File, required): VCF File from which to extract annotations.vcf_index(File, required): Index for the given VCF file.
Optional
runtime_attr_override(RuntimeAttr?)
Defaults
calibration_sensitivity_threshold(Float, default=0.99)
Outputs
scored_vcf(File)scored_vcf_index(File)annotations_hdf5(File?)scores_hdf5(File?)
CompressAndIndex
- description
- Convert a BCF file to a vcf.bgz file and index it.
Inputs
Required
joint_bcf(File, required)
Optional
runtime_attr_override(RuntimeAttr?)
Defaults
num_cpus(Int, default=8)prefix(String, default="out")
Outputs
joint_gvcf(File)joint_gvcf_tbi(File)
ConcatVariants
- description
- Concatenate VCFs/BCFs into a single .vcf.bgz file and index it.
Inputs
Required
variant_files(Array[File], required)
Optional
runtime_attr_override(RuntimeAttr?)
Defaults
is_gvcf(Boolean, default=false)num_cpus(Int, default=4)prefix(String, default="out")
Outputs
combined_vcf(File)combined_vcf_tbi(File)
CopyDP_MINToDP
- description
- Copy the DP_MIN field to the DP field in a VCF file (DP_MIN is generated by Clair3). This enables joint call cohorts with Clair3 (and other callers using DP_MIN) and GATK-called GVCF files (in combination with a new config options to tell GLNexus to use the DP field for its analysis).
Inputs
Required
gvcf(File, required)output_prefix(String, required)
Outputs
gvcf_out(File)gvcf_out_tbi(File)