gnomad.resources.grch38.gnomad

gnomad.resources.grch38.gnomad.SUBSETS

Order to sort subgroupings during VCF export by version.

gnomad.resources.grch38.gnomad.GROUPS

Group names used to generate labels for high quality genotypes and all raw genotypes.

gnomad.resources.grch38.gnomad.SEXES

Sample sexes used in VCF export.

gnomad.resources.grch38.gnomad.POPS

Global ancestry groups in gnomAD by version.

gnomad.resources.grch38.gnomad.COHORTS_WITH_POP_STORED_AS_SUBPOP

Subsets in gnomAD v3.1 that are broken down by their known subpops instead of global pops in the frequency struct.

gnomad.resources.grch38.gnomad.TGP_POPS

1000 Genomes Project (1KG/TGP) subpops.

gnomad.resources.grch38.gnomad.HGDP_POPS

Human Genome Diversity Project (HGDP) subpops.

gnomad.resources.grch38.gnomad.TGP_POP_NAMES

1000 Genomes Project (1KG/TGP) pop label map.

gnomad.resources.grch38.gnomad.POPS_TO_REMOVE_FOR_POPMAX

Populations that are removed before popmax calculations.

gnomad.resources.grch38.gnomad.DOWNSAMPLINGS

List of the downsampling numbers to use for frequency calculations by version.

gnomad.resources.grch38.gnomad.public_release(...)

Retrieve publicly released versioned table resource.

gnomad.resources.grch38.gnomad.coverage(...)

Retrieve gnomAD's coverage table by data_type.

gnomad.resources.grch38.gnomad.all_sites_an(...)

Retrieve gnomAD's all sites allele number table by data_type.

gnomad.resources.grch38.gnomad.coverage_tsv_path(...)

Retrieve gnomAD's coverage table by data_type.

gnomad.resources.grch38.gnomad.release_vcf_path(...)

Publically released VCF.

gnomad.resources.grch38.gnomad.add_grpMaxFAF95_v4(ht)

Add a grpMaxFAF95 struct with 'popmax' and 'popmax_population'.

gnomad.resources.grch38.gnomad.gnomad_gks(...)

Perform gnomad GKS annotations on a range of variants at once.

gnomad.resources.grch38.gnomad.SUBSETS = {'v3': ['non_v2', 'non_topmed', 'non_cancer', 'controls_and_biobanks', 'non_neuro', 'tgp', 'hgdp'], 'v4': ['non_ukb']}

Order to sort subgroupings during VCF export by version.

Ensures that INFO labels in VCF are in desired order (e.g., tgp_raw_AC_esn_XX).

gnomad.resources.grch38.gnomad.GROUPS = ['adj', 'raw']

Group names used to generate labels for high quality genotypes and all raw genotypes.

Used in VCF export.

gnomad.resources.grch38.gnomad.SEXES = ['XX', 'XY']

Sample sexes used in VCF export.

Used to stratify frequency annotations (AC, AN, AF) for each sex.

gnomad.resources.grch38.gnomad.POPS = {'v3': {'genomes': ['afr', 'ami', 'amr', 'asj', 'eas', 'fin', 'nfe', 'oth', 'sas', 'mid']}, 'v4': {'exomes': ['afr', 'amr', 'asj', 'eas', 'fin', 'mid', 'nfe', 'remaining', 'sas'], 'genomes': ['afr', 'ami', 'amr', 'asj', 'eas', 'fin', 'mid', 'nfe', 'remaining', 'sas']}}

Global ancestry groups in gnomAD by version.

gnomad.resources.grch38.gnomad.COHORTS_WITH_POP_STORED_AS_SUBPOP = ['tgp', 'hgdp']

Subsets in gnomAD v3.1 that are broken down by their known subpops instead of global pops in the frequency struct.

gnomad.resources.grch38.gnomad.TGP_POPS = ['esn', 'pur', 'pjl', 'clm', 'jpt', 'chb', 'stu', 'itu', 'tsi', 'mxl', 'ceu', 'msl', 'yri', 'beb', 'fin', 'khv', 'cdx', 'lwk', 'acb', 'asw', 'ibs', 'gbr', 'pel', 'gih', 'chs', 'gwd']

1000 Genomes Project (1KG/TGP) subpops.

gnomad.resources.grch38.gnomad.HGDP_POPS = ['japanese', 'papuanhighlands', 'papuansepik', 'adygei', 'orcadian', 'biaka', 'yakut', 'han', 'northernhan', 'uygur', 'miao', 'mongolian', 'balochi', 'bedouin', 'russian', 'daur', 'pima', 'hezhen', 'sindhi', 'yi', 'oroqen', 'san', 'tuscan', 'tu', 'palestinian', 'tujia', 'druze', 'pathan', 'basque', 'makrani', 'bergamoitalian', 'naxi', 'karitiana', 'sardinian', 'mbuti', 'mozabite', 'yoruba', 'lahu', 'dai', 'cambodian', 'bougainville', 'french', 'brahui', 'hazara', 'bantusouthafrica', 'surui', 'mandenka', 'kalash', 'xibo', 'colombian', 'bantukenya', 'she', 'burusho', 'maya']

Human Genome Diversity Project (HGDP) subpops.

gnomad.resources.grch38.gnomad.TGP_POP_NAMES = {'acb': 'African Caribbean', 'asw': 'African-American', 'beb': 'Bengali', 'cdx': 'Chinese Dai', 'ceu': 'Utah Residents (European Ancestry)', 'chb': 'Han Chinese', 'chs': 'Southern Han Chinese', 'clm': 'Colombian', 'esn': 'Esan', 'fin': 'Finnish', 'gbr': 'British', 'gih': 'Gujarati', 'gwd': 'Gambian', 'ibs': 'Iberian', 'itu': 'Indian Telugu', 'jpt': 'Japanese', 'khv': 'Kinh', 'lwk': 'Luhya', 'msl': 'Mende', 'mxl': 'Mexican-American', 'pel': 'Peruvian', 'pjl': 'Punjabi', 'pur': 'Puerto Rican', 'stu': 'Sri Lankan Tamil', 'tsi': 'Toscani', 'yri': 'Yoruba'}

1000 Genomes Project (1KG/TGP) pop label map.

gnomad.resources.grch38.gnomad.POPS_TO_REMOVE_FOR_POPMAX = {'v3': {'ami', 'asj', 'fin', 'mid', 'oth', 'remaining'}, 'v4': {'ami', 'asj', 'fin', 'oth', 'remaining'}}

Populations that are removed before popmax calculations.

gnomad.resources.grch38.gnomad.DOWNSAMPLINGS = {'v3': [10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000, 15000, 20000, 25000, 30000, 40000, 50000, 60000, 70000, 75000, 80000, 85000, 90000, 95000, 100000, 110000, 120000], 'v4': [10, 100, 500, 1000, 2000, 5000, 10000, 20000, 50000, 100000, 200000, 500000]}

List of the downsampling numbers to use for frequency calculations by version.

gnomad.resources.grch38.gnomad.public_release(data_type)[source]

Retrieve publicly released versioned table resource.

Parameters:

data_type (str) – One of “exomes” or “genomes”

Return type:

VersionedTableResource

Returns:

Release Table

gnomad.resources.grch38.gnomad.coverage(data_type)[source]

Retrieve gnomAD’s coverage table by data_type.

Parameters:

data_type (str) – One of “exomes” or “genomes”

Return type:

VersionedTableResource

Returns:

Coverage Table

gnomad.resources.grch38.gnomad.all_sites_an(data_type)[source]

Retrieve gnomAD’s all sites allele number table by data_type.

Parameters:

data_type (str) – One of “exomes” or “genomes”

Return type:

VersionedTableResource

Returns:

All sites allele number VersionedTableResource

gnomad.resources.grch38.gnomad.coverage_tsv_path(data_type, version=None)[source]

Retrieve gnomAD’s coverage table by data_type.

Parameters:
  • data_type (str) – One of “exomes” or “genomes”

  • version (Optional[str]) –

Return type:

str

Returns:

Coverage Table

gnomad.resources.grch38.gnomad.release_vcf_path(data_type, version, contig)[source]

Publically released VCF. Provide specific contig, i.e. “chr20”, to retrieve contig specific VCF.

Parameters:
  • data_type (str) – One of “exomes” or “genomes”

  • version (str) – One of the release versions of gnomAD on GRCh37

  • contig (str) – Single contig “chr1” to “chrY”

Return type:

str

Returns:

Path to VCF

gnomad.resources.grch38.gnomad.add_grpMaxFAF95_v4(ht)[source]

Add a grpMaxFAF95 struct with ‘popmax’ and ‘popmax_population’.

Also includes a jointGrpMaxFAF95 annotation using the v4 fafmax and joint_fafmax structures.

Parameters:

ht (Table) – Input hail table.

Return type:

Table

Returns:

Annotated hail table.

gnomad.resources.grch38.gnomad.gnomad_gks(locus_interval, version, data_type='genomes', by_ancestry_group=False, by_sex=False, vrs_only=False, custom_ht=None, skip_checkpoint=False, skip_coverage=False, custom_coverage_ht=None)[source]

Perform gnomad GKS annotations on a range of variants at once.

Parameters:
  • locus_interval (IntervalExpression) – Hail IntervalExpression of locus<reference_genome>. e.g. hl.locus_interval(‘chr1’, 6424776, 6461367, reference_genome=”GRCh38”)

  • version (str) – String of version of gnomAD release to use.

  • data_type (str) – String of either “exomes” or “genomes” for the type of reads that are desired.

  • by_ancestry_group (bool) – Boolean to pass to obtain frequency information for each cohort.

  • by_sex (bool) – Boolean to pass to return frequency information for each cohort split by chromosomal sex.

  • vrs_only (bool) – Boolean to pass for only VRS info to be returned (will not include allele frequency information).

  • custom_ht (Table) – Table to use instead of what public_release() method would return for the version.

  • skip_checkpoint (bool) – Bool to pass to skip checkpointing selected fields (checkpointing may be desirable for large datasets by reducing data copies across the cluster).

  • skip_coverage (bool) – Bool to pass to skip adding coverage statistics.

  • custom_coverage_ht (Table) – Custom table to use for coverage statistics instead of the release coverage table.

Return type:

list

Returns:

List of dictionaries containing VRS information (and freq info split by ancestry groups and sex if desired) for specified variant.