gnomad_qc.v5.resources.annotations

Script containing annotation related resources.

Module Functions

gnomad_qc.v5.resources.annotations.get_trio_stats([...])

Get gnomAD v5 (AoU genomes only) trio stats VersionedTableResource.

gnomad_qc.v5.resources.annotations.get_sib_stats([...])

Get the gnomAD v5 (AoU genomes only) sibling stats VersionedTableResource.

gnomad_qc.v5.resources.annotations.get_aou_downsampling([...])

Get the downsampling annotation table.

gnomad_qc.v5.resources.annotations.group_membership([...])

Get the group membership Table for coverage, AN, quality histograms, and frequency calculations.

gnomad_qc.v5.resources.annotations.qual_hists([...])

Get the quality histograms annotation table.

gnomad_qc.v5.resources.annotations.coverage_and_an_path([...])

Fetch filepath for all sites coverage or allele number Table.

gnomad_qc.v5.resources.annotations.get_freq([...])

Get the frequency annotation Table for v5.

gnomad_qc.v5.resources.annotations.get_info_ht([...])

Get the gnomAD v5 (AoU genomes only) info VersionedTableResource.

gnomad_qc.v5.resources.annotations.info_vcf_path([...])

Path to sites VCF (input information for running VQSR).

gnomad_qc.v5.resources.annotations.get_aou_vcf_header([...])

Get path to AoU annotation sites-only VCF header.

gnomad_qc.v5.resources.annotations.get_aou_annotated_sites_only_vcf([...])

Get path to AoU sites-only VCF with annotations needed for variant QC.

gnomad_qc.v5.resources.annotations.get_variant_qc_annotations([...])

Return the VersionedTableResource to the variant QC annotation Table.

gnomad_qc.v5.resources.annotations.get_true_positive_vcf_path([...])

Provide the path to the true positive VCF used as input to VQSR.

gnomad_qc.v5.resources.annotations.get_vep([...])

Get the gnomAD v5 VEP annotation VersionedTableResource.

gnomad_qc.v5.resources.annotations.validate_vep_path([...])

Get the gnomAD v5 VEP annotation VersionedTableResource for validation counts.

Script containing annotation related resources.

gnomad_qc.v5.resources.annotations.get_trio_stats(test=False, environment='batch')[source]

Get gnomAD v5 (AoU genomes only) trio stats VersionedTableResource.

Parameters:
  • test (bool) – Whether to use a temporary path for testing.

  • environment (str) – Environment to use. Default is “batch”. Must be one of “rwb” or “batch”.

Return type:

VersionedTableResource

Returns:

AoU trio stats VersionedTableResource.

gnomad_qc.v5.resources.annotations.get_sib_stats(test=False, environment='batch')[source]

Get the gnomAD v5 (AoU genomes only) sibling stats VersionedTableResource.

Parameters:
  • test (bool) – Whether to use a tmp path for testing.

  • environment (str) – Environment to use. Default is “batch”. Must be one of “rwb” or “batch”.

Return type:

VersionedTableResource

Returns:

AoU sibling stats VersionedTableResource.

gnomad_qc.v5.resources.annotations.get_aou_downsampling(test=False, environment='batch')[source]

Get the downsampling annotation table.

v5 downsamplings only applies to the AoU dataset.

Parameters:
  • test (bool) – Whether to use a tmp path for tests. Default is False.

  • environment (str) – Environment to use. Default is “batch”. Must be one of “rwb” or “batch”.

Return type:

VersionedTableResource

Returns:

Hail Table containing downsampling annotations.

gnomad_qc.v5.resources.annotations.group_membership(test=False, data_set='aou', environment='batch')[source]

Get the group membership Table for coverage, AN, quality histograms, and frequency calculations.

Parameters:
  • test (bool) – Whether to use a tmp path for tests. Default is False.

  • data_set (str) – Data set of annotation resource. Default is “aou”.

  • environment (str) – Environment to use. Default is “batch”. Must be one of “rwb” or “batch”.

Return type:

VersionedTableResource

Returns:

Hail Table containing group membership annotations.

gnomad_qc.v5.resources.annotations.qual_hists(test=False, environment='batch')[source]

Get the quality histograms annotation table.

Parameters:
  • test (bool) – Whether to use a tmp path for tests. Default is False.

  • environment (str) – Environment to use for quality histograms. Default is “batch”. Must be one of “rwb” or “batch”.

Return type:

VersionedTableResource

Returns:

Hail Table containing quality histogram annotations.

gnomad_qc.v5.resources.annotations.coverage_and_an_path(test=False, data_set='aou', environment='batch')[source]

Fetch filepath for all sites coverage or allele number Table.

Note

If data_set is ‘gnomAD’, the returned table only contains coverage and AN for consent drop samples.

Parameters:
  • test (bool) – Whether to use a tmp path for testing. Default is False.

  • data_set (str) – Dataset identifier. Must be one of “aou” or “gnomad”. Default is “aou”.

  • environment (str) – Environment to use. Default is “batch”. Must be one of “rwb”, “batch”, or “dataproc”.

Return type:

VersionedTableResource

Returns:

Coverage and allele number Hail Table.

gnomad_qc.v5.resources.annotations.get_freq(version='5.0', data_type='genomes', test=False, data_set='aou', environment='batch')[source]

Get the frequency annotation Table for v5.

Parameters:
  • version (str) – Version of annotation path to return.

  • data_type (str) – Data type of annotation resource (“genomes” or “exomes”).

  • test (bool) – Whether to use a tmp path for testing.

  • data_set (str) – Data set of annotation resource. Default is “aou”.

  • environment (str) – Environment to use. Default is “batch”. Must be one of “rwb”, “batch”, or “dataproc”.

Return type:

TableResource

Returns:

Hail Table containing frequency annotations.

gnomad_qc.v5.resources.annotations.get_info_ht(test=False, environment='batch')[source]

Get the gnomAD v5 (AoU genomes only) info VersionedTableResource.

Parameters:
  • test (bool) – Whether to use a tmp path for testing.

  • environment (str) – Environment to use. Default is “batch”. Must be one of “rwb” or “batch”.

Return type:

VersionedTableResource

Returns:

Info VersionedTableResource.

gnomad_qc.v5.resources.annotations.info_vcf_path(version='5.0', test=False, environment='batch')[source]

Path to sites VCF (input information for running VQSR).

Parameters:
  • version (str) – Version of annotation path to return.

  • test (bool) – Whether to use a tmp path for testing.

  • environment (str) – Environment to use. Must be one of “rwb” or “batch”. Default is “batch”.

Return type:

str

Returns:

String for the path to the info VCF.

gnomad_qc.v5.resources.annotations.get_aou_vcf_header(environment='batch')[source]

Get path to AoU annotation sites-only VCF header.

This is needed for proper import of the sites-only VCF as the QUALapprox annotation is stated in the previous header as an int but is actually a float.

Parameters:

environment (str) – Environment to use. Default is “batch”. Must be one of “rwb” or “batch”.

Return type:

str

Returns:

Path to the VCF header file.

gnomad_qc.v5.resources.annotations.get_aou_annotated_sites_only_vcf(environment='batch')[source]

Get path to AoU sites-only VCF with annotations needed for variant QC.

Parameters:

environment (str) – Environment to use. Default is “batch”. Must be one of “rwb” or “batch”.

Return type:

str

Returns:

Path to the annotated sites-only VCF.

gnomad_qc.v5.resources.annotations.get_variant_qc_annotations(test=False, environment='batch')[source]

Return the VersionedTableResource to the variant QC annotation Table.

Annotations that are included in the Table:

Features for RF:
  • variant_type

  • allele_type

  • n_alt_alleles

  • has_star

  • AS_QD

  • AS_pab_max

  • AS_MQRankSum

  • AS_SOR

  • AS_ReadPosRankSum

Training sites (bool):
  • transmitted_singleton

  • sibling_singleton

  • fail_hard_filters - (ht.AS_QD < 0.5) | (ht.AS_FS > 60) | (ht.AS_MQ < 30)

Parameters:
  • test (bool) – Whether to use a tmp path for testing.

  • environment (str) – Environment to use. Default is “batch”. Must be one of “rwb”, “batch”, or “dataproc”.

Return type:

VersionedTableResource

Returns:

Table with variant QC annotations.

gnomad_qc.v5.resources.annotations.get_true_positive_vcf_path(version='5.0', test=False, adj=False, true_positive_type='transmitted_singleton', environment='batch')[source]

Provide the path to the true positive VCF used as input to VQSR.

Parameters:
  • version (str) – Version of true positive VCF path to return. Default is CURRENT_ANNOTATION_VERSION.

  • test (bool) – Whether to use a tmp path for testing. Default is False.

  • adj (bool) – Whether to use adj genotypes. Default is False.

  • true_positive_type (str) – Type of true positive VCF path to return. Should be one of “transmitted_singleton”, “sibling_singleton”, or “transmitted_singleton.sibling_singleton”. Default is “transmitted_singleton”.

  • environment (str) – Environment to use. Default is “batch”. Must be one of “rwb”, “batch”, or “dataproc”.

Return type:

str

Returns:

String for the path to the true positive VCF.

gnomad_qc.v5.resources.annotations.get_vep(test=False, vep_version='105', environment='batch')[source]

Get the gnomAD v5 VEP annotation VersionedTableResource.

Parameters:
  • test (bool) – Whether to use a tmp path for analysis of the test Table instead of the full v5 Table.

  • vep_version (str) – VEP version to use (e.g., “105”, “115”). Default is “105”.

  • environment (str) – Environment to use. Default is “batch”. Must be one of “rwb”, “batch”, or “dataproc”.

Return type:

VersionedTableResource

Returns:

gnomAD v5 VEP VersionedTableResource.

gnomad_qc.v5.resources.annotations.validate_vep_path(test=False, vep_version='105', environment='batch')[source]

Get the gnomAD v5 VEP annotation VersionedTableResource for validation counts.

Parameters:
  • test (bool) – Whether to use a tmp path for analysis of the test VDS instead of the full v5 VDS.

  • vep_version (str) – VEP version to use (e.g., “105”, “115”). Default is “105”.

  • environment (str) – Environment to use. Default is “batch”. Must be one of “rwb”, “batch”, or “dataproc”.

Return type:

VersionedTableResource

Returns:

gnomAD v5 VEP VersionedTableResource containing validity check.