gnomad.resources.grch38.gnomad

gnomad.resources.grch38.gnomad.SUBSETS

Order to sort subgroupings during VCF export by version.

gnomad.resources.grch38.gnomad.GROUPS

Group names used to generate labels for high quality genotypes and all raw genotypes.

gnomad.resources.grch38.gnomad.SEXES

Sample sexes used in VCF export.

gnomad.resources.grch38.gnomad.GEN_ANC_GROUPS

Genetic ancestry groups in gnomAD by version.

gnomad.resources.grch38.gnomad.COHORTS_WITH_GEN_ANC_STORED_AS_SUBGRP

Subsets in gnomAD v3.1 that are broken down by their known genetic ancestry subgroups instead of groups in the frequency struct.

gnomad.resources.grch38.gnomad.TGP_GEN_ANC_GROUPS

1000 Genomes Project (1KG/TGP) genetic ancestry subgroups.

gnomad.resources.grch38.gnomad.HGDP_GEN_ANC_GROUPS

Human Genome Diversity Project (HGDP) genetic ancestry subgroups.

gnomad.resources.grch38.gnomad.TGP_GEN_ANC_GROUP_NAMES

1000 Genomes Project (1KG/TGP) genetic ancestry group label map.

gnomad.resources.grch38.gnomad.GEN_ANC_GROUPS_TO_REMOVE_FOR_GRPMAX

Genetic ancestry groups that are removed before genetic ancestry group max calculations.

gnomad.resources.grch38.gnomad.DOWNSAMPLINGS

List of the downsampling numbers to use for frequency calculations by version.

gnomad.resources.grch38.gnomad.public_release(...)

Retrieve publicly released versioned table resource.

gnomad.resources.grch38.gnomad.coverage(...)

Retrieve gnomAD's coverage table by data_type.

gnomad.resources.grch38.gnomad.all_sites_an(...)

Retrieve gnomAD's all sites allele number table by data_type.

gnomad.resources.grch38.gnomad.coverage_tsv_path(...)

Retrieve gnomAD's coverage table by data_type.

gnomad.resources.grch38.gnomad.release_vcf_path(...)

Publically released VCF.

gnomad.resources.grch38.gnomad.add_grpMaxFAF95_v4(ht)

Add a grpMaxFAF95 struct with 'grpmax' and 'grpmax_gen_anc'.

gnomad.resources.grch38.gnomad.gnomad_gks(...)

Perform gnomad GKS annotations on a range of variants at once.

gnomad.resources.grch38.gnomad.pext([pext_type])

Retrieve pext table by type.

gnomad.resources.grch38.gnomad.constraint()

Retrieve gene constraint Table.

gnomad.resources.grch38.gnomad.browser_variant()

Retrieve browser variant table.

gnomad.resources.grch38.gnomad.browser_gene()

Retrieve browser gene table.

gnomad.resources.grch38.gnomad.SUBSETS = {'v3': ['non_v2', 'non_topmed', 'non_cancer', 'controls_and_biobanks', 'non_neuro', 'tgp', 'hgdp'], 'v4': ['non_ukb']}

Order to sort subgroupings during VCF export by version.

Ensures that INFO labels in VCF are in desired order (e.g., tgp_raw_AC_esn_XX).

gnomad.resources.grch38.gnomad.GROUPS = ['adj', 'raw']

Group names used to generate labels for high quality genotypes and all raw genotypes.

Used in VCF export.

gnomad.resources.grch38.gnomad.SEXES = ['XX', 'XY']

Sample sexes used in VCF export.

Used to stratify frequency annotations (AC, AN, AF) for each sex.

gnomad.resources.grch38.gnomad.GEN_ANC_GROUPS = {'v3': {'genomes': ['afr', 'ami', 'amr', 'asj', 'eas', 'fin', 'nfe', 'oth', 'sas', 'mid']}, 'v4': {'exomes': ['afr', 'amr', 'asj', 'eas', 'fin', 'mid', 'nfe', 'remaining', 'sas'], 'genomes': ['afr', 'ami', 'amr', 'asj', 'eas', 'fin', 'mid', 'nfe', 'remaining', 'sas']}}

Genetic ancestry groups in gnomAD by version.

gnomad.resources.grch38.gnomad.COHORTS_WITH_GEN_ANC_STORED_AS_SUBGRP = ['tgp', 'hgdp']

Subsets in gnomAD v3.1 that are broken down by their known genetic ancestry subgroups instead of groups in the frequency struct.

gnomad.resources.grch38.gnomad.TGP_GEN_ANC_GROUPS = ['esn', 'pur', 'pjl', 'clm', 'jpt', 'chb', 'stu', 'itu', 'tsi', 'mxl', 'ceu', 'msl', 'yri', 'beb', 'fin', 'khv', 'cdx', 'lwk', 'acb', 'asw', 'ibs', 'gbr', 'pel', 'gih', 'chs', 'gwd']

1000 Genomes Project (1KG/TGP) genetic ancestry subgroups.

gnomad.resources.grch38.gnomad.HGDP_GEN_ANC_GROUPS = ['japanese', 'papuanhighlands', 'papuansepik', 'adygei', 'orcadian', 'biaka', 'yakut', 'han', 'northernhan', 'uygur', 'miao', 'mongolian', 'balochi', 'bedouin', 'russian', 'daur', 'pima', 'hezhen', 'sindhi', 'yi', 'oroqen', 'san', 'tuscan', 'tu', 'palestinian', 'tujia', 'druze', 'pathan', 'basque', 'makrani', 'bergamoitalian', 'naxi', 'karitiana', 'sardinian', 'mbuti', 'mozabite', 'yoruba', 'lahu', 'dai', 'cambodian', 'bougainville', 'french', 'brahui', 'hazara', 'bantusouthafrica', 'surui', 'mandenka', 'kalash', 'xibo', 'colombian', 'bantukenya', 'she', 'burusho', 'maya']

Human Genome Diversity Project (HGDP) genetic ancestry subgroups.

gnomad.resources.grch38.gnomad.TGP_GEN_ANC_GROUP_NAMES = {'acb': 'African Caribbean', 'asw': 'African-American', 'beb': 'Bengali', 'cdx': 'Chinese Dai', 'ceu': 'Utah Residents (European Ancestry)', 'chb': 'Han Chinese', 'chs': 'Southern Han Chinese', 'clm': 'Colombian', 'esn': 'Esan', 'fin': 'Finnish', 'gbr': 'British', 'gih': 'Gujarati', 'gwd': 'Gambian', 'ibs': 'Iberian', 'itu': 'Indian Telugu', 'jpt': 'Japanese', 'khv': 'Kinh', 'lwk': 'Luhya', 'msl': 'Mende', 'mxl': 'Mexican-American', 'pel': 'Peruvian', 'pjl': 'Punjabi', 'pur': 'Puerto Rican', 'stu': 'Sri Lankan Tamil', 'tsi': 'Toscani', 'yri': 'Yoruba'}

1000 Genomes Project (1KG/TGP) genetic ancestry group label map.

gnomad.resources.grch38.gnomad.GEN_ANC_GROUPS_TO_REMOVE_FOR_GRPMAX = {'v3': {'ami', 'asj', 'fin', 'mid', 'oth', 'remaining'}, 'v4': {'ami', 'asj', 'fin', 'oth', 'remaining'}}

Genetic ancestry groups that are removed before genetic ancestry group max calculations.

gnomad.resources.grch38.gnomad.DOWNSAMPLINGS = {'v3': [10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000, 15000, 20000, 25000, 30000, 40000, 50000, 60000, 70000, 75000, 80000, 85000, 90000, 95000, 100000, 110000, 120000], 'v4': [10, 100, 500, 1000, 2000, 5000, 10000, 20000, 50000, 100000, 200000, 500000]}

List of the downsampling numbers to use for frequency calculations by version.

gnomad.resources.grch38.gnomad.public_release(data_type)[source]

Retrieve publicly released versioned table resource.

Parameters:

data_type (str) – One of “exomes”, “genomes” or “joint”.

Return type:

VersionedTableResource

Returns:

Release Table

gnomad.resources.grch38.gnomad.coverage(data_type)[source]

Retrieve gnomAD’s coverage table by data_type.

Parameters:

data_type (str) – One of “exomes” or “genomes”

Return type:

VersionedTableResource

Returns:

Coverage Table

gnomad.resources.grch38.gnomad.all_sites_an(data_type)[source]

Retrieve gnomAD’s all sites allele number table by data_type.

Parameters:

data_type (str) – One of “exomes” or “genomes”

Return type:

VersionedTableResource

Returns:

All sites allele number VersionedTableResource

gnomad.resources.grch38.gnomad.coverage_tsv_path(data_type, version=None)[source]

Retrieve gnomAD’s coverage table by data_type.

Parameters:
  • data_type (str) – One of “exomes” or “genomes”

  • version (Optional[str]) –

Return type:

str

Returns:

Coverage Table

gnomad.resources.grch38.gnomad.release_vcf_path(data_type, version, contig)[source]

Publically released VCF. Provide specific contig, i.e. “chr20”, to retrieve contig specific VCF.

Parameters:
  • data_type (str) – One of “exomes” or “genomes”

  • version (str) – One of the release versions of gnomAD on GRCh37

  • contig (str) – Single contig “chr1” to “chrY”

Return type:

str

Returns:

Path to VCF

gnomad.resources.grch38.gnomad.add_grpMaxFAF95_v4(ht)[source]

Add a grpMaxFAF95 struct with ‘grpmax’ and ‘grpmax_gen_anc’.

Also includes a jointGrpMaxFAF95 annotation using the v4 fafmax and joint_fafmax structures.

Parameters:

ht (Table) – Input hail table.

Return type:

Table

Returns:

Annotated hail table.

gnomad.resources.grch38.gnomad.gnomad_gks(locus_interval, version, data_type='genomes', by_gen_anc_group=False, by_sex=False, vrs_only=False, custom_ht=None, skip_checkpoint=False, skip_coverage=False, custom_coverage_ht=None)[source]

Perform gnomad GKS annotations on a range of variants at once.

Parameters:
  • locus_interval (IntervalExpression) – Hail IntervalExpression of locus<reference_genome>. e.g. hl.locus_interval(‘chr1’, 6424776, 6461367, reference_genome=”GRCh38”)

  • version (str) – String of version of gnomAD release to use.

  • data_type (str) – String of either “exomes” or “genomes” for the type of reads that are desired.

  • by_gen_anc_group (bool) – Boolean to pass to obtain frequency information for each cohort.

  • by_sex (bool) – Boolean to pass to return frequency information for each cohort split by chromosomal sex.

  • vrs_only (bool) – Boolean to pass for only VRS info to be returned (will not include allele frequency information).

  • custom_ht (Table) – Table to use instead of what public_release() method would return for the version.

  • skip_checkpoint (bool) – Bool to pass to skip checkpointing selected fields (checkpointing may be desirable for large datasets by reducing data copies across the cluster).

  • skip_coverage (bool) – Bool to pass to skip adding coverage statistics.

  • custom_coverage_ht (Table) – Custom table to use for coverage statistics instead of the release coverage table.

Return type:

list

Returns:

List of dictionaries containing VRS information (and freq info split by ancestry groups and sex if desired) for specified variant.

gnomad.resources.grch38.gnomad.pext(pext_type='base_level')[source]

Retrieve pext table by type.

Parameters:

pext_type (str) – One of “base_level” or “annotation_level”. Default is “base_level”.

Return type:

GnomadPublicTableResource

Returns:

Pext Table.

gnomad.resources.grch38.gnomad.constraint()[source]

Retrieve gene constraint Table.

Return type:

VersionedTableResource

Returns:

Gene constraint Table.

gnomad.resources.grch38.gnomad.browser_variant()[source]

Retrieve browser variant table.

Return type:

VersionedTableResource

Returns:

Browser variant Table.

gnomad.resources.grch38.gnomad.browser_gene()[source]

Retrieve browser gene table.

Return type:

GnomadPublicTableResource

Returns:

Browser gene Table.