gnomad_qc.v4.resources.annotations

Script containing annotation related resources.

Module Functions

gnomad_qc.v4.resources.annotations.get_info([...])

Get the gnomAD v4 info VersionedTableResource.

gnomad_qc.v4.resources.annotations.get_vep([...])

Get the gnomAD v4 VEP annotation VersionedTableResource.

gnomad_qc.v4.resources.annotations.validate_vep_path([...])

Get the gnomAD v4 VEP annotation VersionedTableResource for validation counts.

gnomad_qc.v4.resources.annotations.get_trio_stats([...])

Get the gnomAD v4 trio stats VersionedTableResource.

gnomad_qc.v4.resources.annotations.get_sib_stats([test])

Get the gnomAD v4 sibling stats VersionedTableResource.

gnomad_qc.v4.resources.annotations.get_variant_qc_annotations([test])

Return the VersionedTableResource to the RF-ready annotated Table.

gnomad_qc.v4.resources.annotations.info_vcf_path([...])

Path to sites VCF (input information for running VQSR).

gnomad_qc.v4.resources.annotations.get_true_positive_vcf_path([...])

Provide the path to the transmitted singleton VCF used as input to VQSR.

gnomad_qc.v4.resources.annotations.get_downsampling([...])

Get the downsampling annotation table.

gnomad_qc.v4.resources.annotations.get_freq([...])

Get the frequency annotation table for a specified release.

gnomad_qc.v4.resources.annotations.get_all_sites_an_and_qual_hists([...])

Get the all sites AN and qual hists TableResource.

gnomad_qc.v4.resources.annotations.get_combined_frequency([...])

Get the combined v4 genome and exome frequency annotation VersionedTableResource.

gnomad_qc.v4.resources.annotations.get_freq_comparison(method)

Get VersionedTableResource for a frequency comparison between v4 genomes and exomes.

gnomad_qc.v4.resources.annotations.get_insilico_predictors([...])

Get the path to the in silico predictors TableResource for a specified release.

gnomad_qc.v4.resources.annotations.get_vrs([...])

Get the gnomAD v4 VersionedTableResource containing VRS annotations.

gnomad_qc.v4.resources.annotations.hgdp_tgp_updated_callstats(subset)

Get the HGDP + 1KG/TGP subset updated call stats TableResource.

gnomad_qc.v4.resources.annotations.get_split_vds([...])

Get the gnomAD v4 split VDS.

Script containing annotation related resources.

gnomad_qc.v4.resources.annotations.get_info(split=True, test=False)[source]

Get the gnomAD v4 info VersionedTableResource.

Parameters:
  • split (bool) – Whether to return the split or multi-allelic version of the resource.

  • test (bool) – Whether to use a tmp path for analysis of the test VDS instead of the full v4 VDS.

Return type:

VersionedTableResource

Returns:

gnomAD v4 info VersionedTableResource.

gnomad_qc.v4.resources.annotations.get_vep(test=False, data_type='exomes')[source]

Get the gnomAD v4 VEP annotation VersionedTableResource.

Parameters:
  • test (bool) – Whether to use a tmp path for analysis of the test Table instead of the full v4 Table.

  • data_type (str) – Data type of annotation resource. e.g. “exomes” or “genomes”. Default is “exomes”.

Return type:

VersionedTableResource

Returns:

gnomAD v4 VEP VersionedTableResource.

gnomad_qc.v4.resources.annotations.validate_vep_path(test=False, data_type='exomes')[source]

Get the gnomAD v4 VEP annotation VersionedTableResource for validation counts.

Parameters:
  • test (bool) – Whether to use a tmp path for analysis of the test VDS instead of the full v4 VDS.

  • data_type (str) – Data type of annotation resource. e.g. “exomes” or “genomes”. Default is “exomes”.

Return type:

VersionedTableResource

Returns:

gnomAD v4 VEP VersionedTableResource containing validity check.

gnomad_qc.v4.resources.annotations.get_trio_stats(test=False, releasable_only=False)[source]

Get the gnomAD v4 trio stats VersionedTableResource.

Parameters:
  • test (bool) – Whether to use a temporary path for testing.

  • releasable_only (bool) – Whether to use only releasable data.

Return type:

VersionedTableResource

Returns:

gnomAD v4 trio stats VersionedTableResource.

gnomad_qc.v4.resources.annotations.get_sib_stats(test=False)[source]

Get the gnomAD v4 sibling stats VersionedTableResource.

Parameters:

test (bool) – Whether to use a tmp path for testing.

Return type:

VersionedTableResource

Returns:

gnomAD v4 sibling stats VersionedTableResource.

gnomad_qc.v4.resources.annotations.get_variant_qc_annotations(test=False)[source]

Return the VersionedTableResource to the RF-ready annotated Table.

Annotations that are included in the Table:

Features for RF:
  • variant_type

  • allele_type

  • n_alt_alleles

  • has_star

  • AS_QD

  • AS_pab_max

  • AS_MQRankSum

  • AS_SOR

  • AS_ReadPosRankSum

Training sites (bool):
  • transmitted_singleton

  • sibling_singleton

  • fail_hard_filters - (ht.QD < 2) | (ht.FS > 60) | (ht.MQ < 30)

Parameters:

test (bool) – Whether to use a tmp path for testing.

Return type:

VersionedTableResource

Returns:

Table with variant QC annotations.

gnomad_qc.v4.resources.annotations.info_vcf_path(info_method='AS', version='4.0', split=False, test=False)[source]

Path to sites VCF (input information for running VQSR).

Parameters:
  • info_method (str) – Method for generating info VCF. Must be one of “AS”, “quasi”, or “set_long_AS_missing”. Default is “AS”.

  • version (str) – Version of annotation path to return.

  • split (bool) – Whether to return the split or multi-allelic version of the resource.

  • test (bool) – Whether to use a tmp path for analysis of the test VDS instead of the full v4 VDS.

Return type:

str

Returns:

String for the path to the info VCF.

gnomad_qc.v4.resources.annotations.get_true_positive_vcf_path(version='4.0', test=False, adj=False, true_positive_type='transmitted_singleton')[source]

Provide the path to the transmitted singleton VCF used as input to VQSR.

Parameters:
  • version (str) – Version of true positive VCF path to return.

  • test (bool) – Whether to use a tmp path for testing.

  • adj (bool) – Whether to use adj genotypes.

  • true_positive_type (str) – Type of true positive VCF path to return. Should be one of “transmitted_singleton”, “sibling_singleton”, or “transmitted_singleton.sibling_singleton”. Default is “transmitted_singleton”.

Return type:

str

Returns:

String for the path to the true positive VCF.

gnomad_qc.v4.resources.annotations.get_downsampling(test=False, subset=None)[source]

Get the downsampling annotation table.

Parameters:
  • test (bool) – Whether to use a tmp path for tests. Default is False.

  • subset (Optional[str]) – Optional subset to return downsampling Table for. Downsampling for entire dataset will be returned if not specified.

Return type:

VersionedTableResource

Returns:

Hail Table containing subset or overall dataset downsampling annotations.

gnomad_qc.v4.resources.annotations.get_freq(version=None, data_type='exomes', test=False, hom_alt_adjusted=False, chrom=None, intermediate_subset=None, finalized=True)[source]

Get the frequency annotation table for a specified release.

Parameters:
  • version (str) – Version of annotation path to return.

  • data_type (str) – Data type of annotation resource. e.g. “exomes” or “genomes”.

  • test (bool) – Whether to use a tmp path for tests.

  • hom_alt_adjusted – Whether to return the hom alt adjusted frequency table.

  • chrom (Optional[str]) – Chromosome to return frequency table for. Entire Table will be returned if not specified.

  • intermediate_subset (Optional[str]) – Optional intermediate subset to return temp frequency Table for. Entire Table will be returned if not specified.

  • finalized (bool) – Whether to return the finalized frequency table. Default is True.

Return type:

TableResource

Returns:

Hail Table containing subset or overall cohort frequency annotations.

gnomad_qc.v4.resources.annotations.get_all_sites_an_and_qual_hists(data_type='exomes', test=False)[source]

Get the all sites AN and qual hists TableResource.

Parameters:
  • data_type (str) – ‘exomes’ or ‘genomes’. Default is ‘exomes’.

  • test (bool) – Whether to use a tmp path for testing.

Return type:

VersionedTableResource

Returns:

Hail Table containing all sites AN and qual hists annotations.

gnomad_qc.v4.resources.annotations.get_combined_frequency(test=False, filtered=True)[source]

Get the combined v4 genome and exome frequency annotation VersionedTableResource.

Parameters:
  • test (bool) – Whether to use a tmp path for testing.

  • filtered (bool) – Whether to return the resource for the filtered combined frequency. Default is True.

Return type:

VersionedTableResource

Returns:

Hail Table containing combined frequency annotations.

gnomad_qc.v4.resources.annotations.get_freq_comparison(method, test=False, filtered=True)[source]

Get VersionedTableResource for a frequency comparison between v4 genomes and exomes.

Table contains results from one of the following comparison methods:
  • ‘contingency_table_test’: Hail’s contingency table test – chi-squared or

    Fisher’s exact test of independence depending on min allele count.

  • ‘cmh_test’: Cochran–Mantel–Haenszel test – stratified test of independence

    for 2x2xK contingency tables.

Parameters:
  • method (str) – Method used to compare frequencies between v4 genomes and exomes. Can be one of contingency_table_test or cmh_test.

  • test (bool) – Whether to use a tmp path for testing. Default is False.

  • filtered (bool) – Whether to return the filtered frequency comparison Table. Default is True.

Return type:

VersionedTableResource

Returns:

VersionedTableResource containing results from the specified comparison method.

gnomad_qc.v4.resources.annotations.get_insilico_predictors(predictor='cadd')[source]

Get the path to the in silico predictors TableResource for a specified release.

Parameters:

predictor (str) – One of the in silico predictors available in gnomAD v4, including cadd, revel, primate_ai, splice_ai, and pangolin.

Return type:

VersionedTableResource

Returns:

in silico predictor VersionedTableResource for gnomAD v4.

gnomad_qc.v4.resources.annotations.get_vrs(original_annotations=False, test=False, data_type='exomes')[source]

Get the gnomAD v4 VersionedTableResource containing VRS annotations.

Parameters:
  • original_annotations (bool) – Whether to obtain the original input Table with all its annotations in addition to the added on VRS annotations. If set to False, obtain a Table with only the VRS annotations.

  • test (bool) – Whether to use a tmp path for analysis of the test Table instead of the full v4 Table.

  • data_type (str) – Data type of annotation resource. e.g. “exomes” or “genomes”. Default is “exomes”.

Return type:

VersionedTableResource

Returns:

gnomAD v4 VRS VersionedTableResource.

gnomad_qc.v4.resources.annotations.hgdp_tgp_updated_callstats(subset, test=False)[source]

Get the HGDP + 1KG/TGP subset updated call stats TableResource.

Parameters:
  • subset (str) – The subset of the HGDP + 1KG/TGP release to return, must be “added”, “subtracted”, “pop_diff”, “join”, “v3_release_an”, “v3_pop_diff_an”, or “pre_validity_check”.

  • test (bool) – Whether to return the annotation resource for testing purposes.

Return type:

VersionedTableResource

Returns:

MatrixTableResource for specified subset.

gnomad_qc.v4.resources.annotations.get_split_vds(version=None, data_type='exomes', test=False)[source]

Get the gnomAD v4 split VDS.

This is a temporary resource that will be removed once the split VDS is no longer needed. Given the uncertainies around frequency calculation runtimes, we cannot store it in gnomad-tmp but this needs to be deleted once frequency work is complete.

Parameters:
  • version (str) – Version of annotation path to return.

  • data_type (str) – Data type of annotation resource. e.g. “exomes” or “genomes”. Default is “exomes”.

  • test (bool) – Whether to use a tmp path for analysis of the test Table instead of the full v4 Table.

Return type:

VariantDatasetResource

Returns:

gnomAD v4 VariantDatasetResource.