gnomad_qc.v5.resources.annotations

Script containing annotation related resources.

Module Functions

gnomad_qc.v5.resources.annotations.get_trio_stats([...])

Get gnomAD v5 (AoU genomes only) trio stats VersionedTableResource.

gnomad_qc.v5.resources.annotations.get_sib_stats([...])

Get the gnomAD v5 (AoU genomes only) sibling stats VersionedTableResource.

gnomad_qc.v5.resources.annotations.get_aou_downsampling([...])

Get the downsampling annotation table.

gnomad_qc.v5.resources.annotations.group_membership([...])

Get the group membership Table for coverage, AN, quality histograms, and frequency calculations.

gnomad_qc.v5.resources.annotations.qual_hists([...])

Get the quality histograms annotation table.

gnomad_qc.v5.resources.annotations.coverage_and_an_path([...])

Fetch filepath for all sites coverage or allele number Table.

gnomad_qc.v5.resources.annotations.get_freq([...])

Get the frequency annotation Table for v5.

gnomad_qc.v5.resources.annotations.get_info_ht([...])

Get the gnomAD v5 (AoU genomes only) info VersionedTableResource.

gnomad_qc.v5.resources.annotations.info_vcf_path([...])

Path to sites VCF (input information for running VQSR).

gnomad_qc.v5.resources.annotations.get_aou_vcf_header([...])

Get path to AoU annotation sites-only VCF header.

gnomad_qc.v5.resources.annotations.get_aou_annotated_sites_only_vcf([...])

Get path to AoU sites-only VCF with annotations needed for variant QC.

gnomad_qc.v5.resources.annotations.get_vep([...])

Get the gnomAD v5 VEP annotation VersionedTableResource.

gnomad_qc.v5.resources.annotations.validate_vep_path([...])

Get the gnomAD v5 VEP annotation VersionedTableResource for validation counts.

Script containing annotation related resources.

gnomad_qc.v5.resources.annotations.get_trio_stats(test=False, environment='rwb')[source]

Get gnomAD v5 (AoU genomes only) trio stats VersionedTableResource.

Parameters:
  • test (bool) – Whether to use a temporary path for testing.

  • environment (str) – Environment to use. Default is “rwb”. Must be one of “rwb” or “batch”.

Return type:

VersionedTableResource

Returns:

AoU trio stats VersionedTableResource.

gnomad_qc.v5.resources.annotations.get_sib_stats(test=False, environment='rwb')[source]

Get the gnomAD v5 (AoU genomes only) sibling stats VersionedTableResource.

Parameters:
  • test (bool) – Whether to use a tmp path for testing.

  • environment (str) – Environment to use. Default is “rwb”. Must be one of “rwb” or “batch”.

Return type:

VersionedTableResource

Returns:

AoU sibling stats VersionedTableResource.

gnomad_qc.v5.resources.annotations.get_aou_downsampling(test=False, environment='rwb')[source]

Get the downsampling annotation table.

v5 downsamplings only applies to the AoU dataset.

Parameters:
  • test (bool) – Whether to use a tmp path for tests. Default is False.

  • environment (str) – Environment to use. Default is “rwb”. Must be one of “rwb” or “batch”.

Return type:

VersionedTableResource

Returns:

Hail Table containing downsampling annotations.

gnomad_qc.v5.resources.annotations.group_membership(test=False, data_set='aou', environment='rwb')[source]

Get the group membership Table for coverage, AN, quality histograms, and frequency calculations.

Parameters:
  • test (bool) – Whether to use a tmp path for tests. Default is False.

  • data_set (str) – Data set of annotation resource. Default is “aou”.

  • environment (str) – Environment to use. Default is “rwb”. Must be one of “rwb” or “batch”.

Return type:

VersionedTableResource

Returns:

Hail Table containing group membership annotations.

gnomad_qc.v5.resources.annotations.qual_hists(test=False, environment='rwb')[source]

Get the quality histograms annotation table.

Parameters:
  • test (bool) – Whether to use a tmp path for tests. Default is False.

  • environment (str) – Environment to use for quality histograms. Must be one of “rwb” or “batch”.

Return type:

VersionedTableResource

Returns:

Hail Table containing quality histogram annotations.

gnomad_qc.v5.resources.annotations.coverage_and_an_path(test=False, data_set='aou', environment='rwb')[source]

Fetch filepath for all sites coverage or allele number Table.

Note

If data_set is ‘gnomAD’, the returned table only contains coverage and AN for consent drop samples.

Parameters:
  • test (bool) – Whether to use a tmp path for testing. Default is False.

  • data_set (str) – Dataset identifier. Must be one of “aou” or “gnomad”. Default is “aou”.

  • environment (str) – Environment to use. Default is “rwb”. Must be one of “rwb”, “batch”, or “dataproc”.

Return type:

VersionedTableResource

Returns:

Coverage and allele number Hail Table.

gnomad_qc.v5.resources.annotations.get_freq(version='5.0', data_type='genomes', test=False, data_set='aou', environment='rwb')[source]

Get the frequency annotation Table for v5.

Parameters:
  • version (str) – Version of annotation path to return.

  • data_type (str) – Data type of annotation resource (“genomes” or “exomes”).

  • test (bool) – Whether to use a tmp path for testing.

  • data_set (str) – Data set of annotation resource. Default is “aou”.

  • environment (str) – Environment to use. Default is “rwb”. Must be one of “rwb”, “batch”, or “dataproc”.

Return type:

TableResource

Returns:

Hail Table containing frequency annotations.

gnomad_qc.v5.resources.annotations.get_info_ht(test=False, environment='batch')[source]

Get the gnomAD v5 (AoU genomes only) info VersionedTableResource.

Parameters:
  • test (bool) – Whether to use a tmp path for testing.

  • environment (str) – Environment to use. Default is “batch”. Must be one of “rwb” or “batch”.

Return type:

VersionedTableResource

Returns:

Info VersionedTableResource.

gnomad_qc.v5.resources.annotations.info_vcf_path(version='5.0', test=False, environment='batch')[source]

Path to sites VCF (input information for running VQSR).

Parameters:
  • version (str) – Version of annotation path to return.

  • test (bool) – Whether to use a tmp path for testing.

  • environment (str) – Environment to use. Must be one of “rwb” or “batch”. Default is “batch”.

Return type:

str

Returns:

String for the path to the info VCF.

gnomad_qc.v5.resources.annotations.get_aou_vcf_header(environment='batch')[source]

Get path to AoU annotation sites-only VCF header.

This is needed for proper import of the sites-only VCF as the QUALapprox annotation is stated in the previous header as an int but is actually a float.

Parameters:

environment (str) – Environment to use. Default is “batch”. Must be one of “rwb” or “batch”.

Return type:

str

Returns:

Path to the VCF header file.

gnomad_qc.v5.resources.annotations.get_aou_annotated_sites_only_vcf(environment='batch')[source]

Get path to AoU sites-only VCF with annotations needed for variant QC.

Parameters:

environment (str) – Environment to use. Default is “batch”. Must be one of “rwb” or “batch”.

Return type:

str

Returns:

Path to the annotated sites-only VCF.

gnomad_qc.v5.resources.annotations.get_vep(test=False, vep_version='105', environment='batch')[source]

Get the gnomAD v5 VEP annotation VersionedTableResource.

Parameters:
  • test (bool) – Whether to use a tmp path for analysis of the test Table instead of the full v5 Table.

  • vep_version (str) – VEP version to use (e.g., “105”, “115”). Default is “105”.

  • environment (str) – Environment to use. Default is “batch”. Must be one of “rwb”, “batch”, or “dataproc”.

Return type:

VersionedTableResource

Returns:

gnomAD v5 VEP VersionedTableResource.

gnomad_qc.v5.resources.annotations.validate_vep_path(test=False, vep_version='105', environment='batch')[source]

Get the gnomAD v5 VEP annotation VersionedTableResource for validation counts.

Parameters:
  • test (bool) – Whether to use a tmp path for analysis of the test VDS instead of the full v5 VDS.

  • vep_version (str) – VEP version to use (e.g., “105”, “115”). Default is “105”.

  • environment (str) – Environment to use. Default is “batch”. Must be one of “rwb”, “batch”, or “dataproc”.

Return type:

VersionedTableResource

Returns:

gnomAD v5 VEP VersionedTableResource containing validity check.