AnnotateVcf
Adds annotations, such as the inferred function and allele frequencies of variants, to a VCF.
Annotations methods include:
- Functional annotation - The GATK tool SVAnnotate is used to annotate SVs with inferred functional consequence on protein-coding regions, regulatory regions such as UTR and promoters, and other non-coding elements.
- Allele Frequency (
AF
) annotation - annotate SVs with their allele frequencies across all samples, and samples of specific sex, as well as specific subpopulations. - Allele Frequency annotation with external callset - annotate SVs with the allele frequencies of their overlapping SVs in another callset, eg. the gnomAD-SV reference callset.
The following diagram illustrates the recommended invocation order:
Inputs
vcf
Any SV VCF. Running on the genotype filtered VCF is recommended.
prefix
Prefix for the output VCF, such as the cohort name. May be alphanumeric with underscores.
protein_coding_gtf
Coding transcript definitions, see here.
Optional noncoding_bed
Non-coding reference intervals, see here.
Optional promoter_window
Promoter window size. See here.
Optional max_breakend_as_cnv_length
Max size for treating BND
records as CNVs. See here.
Optional svannotate_additional_args
Additional arguments for GATK-SVAnnotate.
Optional sample_pop_assignments
Two-column file with sample ID & population assignment. "." for population will ignore the sample. If provided, annotates population-specific allele frequencies.
Optional sample_keep_list
If provided, subset samples to this list in the output VCF.
Optional ped_file
Family structures and sex assignments determined in EvidenceQC. See PED file format. If provided, sex-specific allele frequencies will be annotated.
Optional par_bed
Pseudo-autosomal region (PAR) bed file. If provided, variants overlapping PARs will be annotated with the PAR
field.
sv_per_shard
Shard size for parallel processing. Decreasing this may help if the workflow is running too slowly.
Optional external_af_ref_bed
Reference SV set (see here). If provided, annotates variants with allele frequencies from the reference population.
Optional external_af_ref_prefix
External AF
annotation prefix. Required if providing external_af_ref_bed.
Optional external_af_population
Population names in the external SV reference set, e.g. "ALL", "AFR", "AMR", "EAS", "EUR". Required if providing external_af_ref_bed and must match the populations in the bed file.
Optional use_hail
Default: false
. Use Hail for VCF concatenation. This should only be used for projects with over 50k samples. If enabled, the
gcs_project must also be provided. Does not work on Terra.
Optional gcs_project
Google Cloud project ID. Required only if enabling use_hail.
Outputs
annotated_vcf
Output VCF.