gnomad_qc.v5.annotations.generate_variant_qc_annotations
Script to generate annotations for variant QC on gnomAD v5.
usage: gnomad_qc.v5.annotations.generate_variant_qc_annotations.py
[-h] [--environment {rwb,batch}] [--app-name APP_NAME]
[--driver-cores DRIVER_CORES] [--driver-memory DRIVER_MEMORY]
[--worker-cores WORKER_CORES] [--worker-memory WORKER_MEMORY]
[--overwrite] [--test] [--test-n-partitions [TEST_N_PARTITIONS]]
[--generate-trio-stats] [--generate-sibling-stats] [--create-info-ht]
[--lowqual-indel-phred-het-prior LOWQUAL_INDEL_PHRED_HET_PRIOR]
[--export-info-vcf] [--create-variant-qc-annotation-ht]
[--impute-features] [--n-partitions N_PARTITIONS]
[--export-true-positive-vcfs] [--transmitted-singletons]
[--sibling-singletons]
Named Arguments
- --environment
Possible choices: rwb, batch
Environment where script will run.
Default: “rwb”
- --app-name
Job name for batch/QoB backend.
- --driver-cores
Number of cores. Applies to Batch environment only. Hail default is 1 if unspecified.
- --driver-memory
Memory for driver node. Applies to Batch environment only. Hail default is ‘standard’ if unspecified.
- --worker-cores
Number of cores. Applies to Batch environment only. Hail default is 1 if unspecified.
- --worker-memory
Memory for worker nodes. Applies to Batch environment only. Hail default is ‘standard’ if unspecified.
- --overwrite
Overwrite output files.
Default: False
- --test
Write to test path.
Default: False
- --test-n-partitions
Use only n partitions of the VDS as input for testing purposes (default: 2).
- --generate-trio-stats
Calculates trio stats.
Default: False
- --generate-sibling-stats
Calculates sibling stats.
Default: False
- --create-info-ht
Create the info ht containing annotations needed for variant QC.
Default: False
- --lowqual-indel-phred-het-prior
Phred-scaled prior for a het genotype at a site with a low quality indel. Default is 40. We use 1/10k bases (phred=40) to be more consistent with the filtering used by Broad’s Data Sciences Platform for VQSR.
Default: 40
- --export-info-vcf
Export info ht as VCF.
Default: False
Variant QC annotation HT parameters.
- --create-variant-qc-annotation-ht
Creates an annotated HT with features for variant QC.
Default: False
- --impute-features
If set, imputation is performed for variant QC features.
Default: False
- --n-partitions
Desired number of partitions for variant QC annotation HT.
Default: 5000
Export true positive VCFs
Arguments used to define true positive variant set.
- --export-true-positive-vcfs
Exports true positive variants (–transmitted-singletons and/or –sibling-singletons) to VCF files.
Default: False
- --transmitted-singletons
Include transmitted singletons in the exports of true positive variants to VCF files.
Default: False
- --sibling-singletons
Include sibling singletons in the exports of true positive variants to VCF files.
Default: False
Module Functions
|
Compute AC and AC_raw annotations for each allele count filter group. |
|
Import a VCF of AoU annotated sites, reformat annotations, and add AS_lowqual. |
|
Generate trio transmission stats from a VariantDataset and pedigree info. |
|
Generate sibling stats from a VariantDataset and relatedness info. |
|
Create a Table with all necessary annotations for variant QC. |
|
Get Tables with raw and adj true positive variants to export as a VCF for use in VQSR. |
|
Generate all variant annotations needed for variant QC. |
|
Get script argument parser. |
Script to generate annotations for variant QC on gnomAD v5.
- gnomad_qc.v5.annotations.generate_variant_qc_annotations.generate_ac_info_ht(vds)[source]
Compute AC and AC_raw annotations for each allele count filter group.
Function also adds AS_pab_max and allele_info annotations.
- Parameters:
vds (
VariantDataset) – VariantDataset to use for computing AC and AC_raw annotations.- Return type:
- Returns:
Table with AC and AC_raw annotations split by high quality, release, and unrelated.
- gnomad_qc.v5.annotations.generate_variant_qc_annotations.create_info_ht(vcf_path, header_path, lowqual_indel_phred_het_prior=40, vds=None, test=False)[source]
Import a VCF of AoU annotated sites, reformat annotations, and add AS_lowqual.
- Parameters:
vcf_path (
str) – Path to the annotated sites-only VCF.header_path (
str) – Path to the header file for the VCF.lowqual_indel_phred_het_prior (
int) – Phred-scaled prior for a het genotype at a site with a low quality indel. Default is 40. We use 1/10k bases (phred=40) to be more consistent with the filtering used by Broad’s Data Sciences Platform for VQSR.vds (
VariantDataset) – VariantDataset to use for computing AC and AC_raw annotations.test (
bool) – Whether to write run a test using just the first two partitions of the loaded VCF.
- Return type:
- Returns:
Hail Table with reformatted annotations.
- gnomad_qc.v5.annotations.generate_variant_qc_annotations.run_generate_trio_stats(mt, fam_ped)[source]
Generate trio transmission stats from a VariantDataset and pedigree info.
- Parameters:
mt (
MatrixTable) – Dense trio MatrixTable.fam_ped (
Pedigree) – Pedigree containing trio info.
- Return type:
- Returns:
Table containing trio stats.
- gnomad_qc.v5.annotations.generate_variant_qc_annotations.run_generate_sib_stats(mt, relatedness_ht)[source]
Generate sibling stats from a VariantDataset and relatedness info.
- Parameters:
mt (
MatrixTable) – Input MatrixTable.relatedness_ht (
Table) – Table containing relatedness info.
- Return type:
- Returns:
Table containing sibling stats.
- gnomad_qc.v5.annotations.generate_variant_qc_annotations.create_variant_qc_annotation_ht(info_ht, trio_stats_ht, sib_stats_ht, impute_features=True, n_partitions=5000)[source]
Create a Table with all necessary annotations for variant QC.
Annotations that are included:
- Features for RF:
variant_type
allele_type
n_alt_alleles
has_star
AS_QD
AS_pab_max
AS_MQRankSum
AS_SOR
AS_ReadPosRankSum
- Training sites (bool):
transmitted_singleton
sibling_singleton
fail_hard_filters - (ht.AS_QD < 0.5) | (ht.AS_FS > 60) | (ht.AS_MQ < 30)
- Parameters:
info_ht (
Table) – Info Table with split multi-allelics.trio_stats_ht (
Table) – Table with trio statistics.sib_stats_ht (
Table) – Table with sibling statistics.impute_features (
bool) – Whether to impute features using feature medians (this is done by variant type).n_partitions (
int) – Number of partitions to use for final annotated Table.
- Return type:
- Returns:
Hail Table with all annotations needed for variant QC.
- gnomad_qc.v5.annotations.generate_variant_qc_annotations.get_tp_ht_for_vcf_export(ht, transmitted_singletons=False, sibling_singletons=False)[source]
Get Tables with raw and adj true positive variants to export as a VCF for use in VQSR.
- Parameters:
ht (
Table) – Input Table with transmitted singleton and sibling singleton information.transmitted_singletons (
bool) – Whether to include transmitted singletons in the true positive variants.sibling_singletons (
bool) – Whether to include sibling singletons in the true positive variants.
- Return type:
Dict[str,Table]- Returns:
Dictionary of ‘raw’ and ‘adj’ true positive variant sites Tables.