gnomad.resources.grch38.gnomad
Order to sort subgroupings during VCF export by version. |
|
Group names used to generate labels for high quality genotypes and all raw genotypes. |
|
Sample sexes used in VCF export. |
|
Global ancestry groups in gnomAD by version. |
|
|
Subsets in gnomAD v3.1 that are broken down by their known subpops instead of global pops in the frequency struct. |
1000 Genomes Project (1KG/TGP) subpops. |
|
Human Genome Diversity Project (HGDP) subpops. |
|
1000 Genomes Project (1KG/TGP) pop label map. |
|
Populations that are removed before popmax calculations. |
|
List of the downsampling numbers to use for frequency calculations by version. |
|
Retrieve publicly released versioned table resource. |
|
Retrieve gnomAD's coverage table by data_type. |
|
Retrieve gnomAD's all sites allele number table by data_type. |
|
Retrieve gnomAD's coverage table by data_type. |
|
Publically released VCF. |
|
Add a grpMaxFAF95 struct with 'popmax' and 'popmax_population'. |
|
Perform gnomad GKS annotations on a range of variants at once. |
- gnomad.resources.grch38.gnomad.SUBSETS = {'v3': ['non_v2', 'non_topmed', 'non_cancer', 'controls_and_biobanks', 'non_neuro', 'tgp', 'hgdp'], 'v4': ['non_ukb']}
Order to sort subgroupings during VCF export by version.
Ensures that INFO labels in VCF are in desired order (e.g., tgp_raw_AC_esn_XX).
- gnomad.resources.grch38.gnomad.GROUPS = ['adj', 'raw']
Group names used to generate labels for high quality genotypes and all raw genotypes.
Used in VCF export.
- gnomad.resources.grch38.gnomad.SEXES = ['XX', 'XY']
Sample sexes used in VCF export.
Used to stratify frequency annotations (AC, AN, AF) for each sex.
- gnomad.resources.grch38.gnomad.POPS = {'v3': {'genomes': ['afr', 'ami', 'amr', 'asj', 'eas', 'fin', 'nfe', 'oth', 'sas', 'mid']}, 'v4': {'exomes': ['afr', 'amr', 'asj', 'eas', 'fin', 'mid', 'nfe', 'remaining', 'sas'], 'genomes': ['afr', 'ami', 'amr', 'asj', 'eas', 'fin', 'mid', 'nfe', 'remaining', 'sas']}}
Global ancestry groups in gnomAD by version.
- gnomad.resources.grch38.gnomad.COHORTS_WITH_POP_STORED_AS_SUBPOP = ['tgp', 'hgdp']
Subsets in gnomAD v3.1 that are broken down by their known subpops instead of global pops in the frequency struct.
- gnomad.resources.grch38.gnomad.TGP_POPS = ['esn', 'pur', 'pjl', 'clm', 'jpt', 'chb', 'stu', 'itu', 'tsi', 'mxl', 'ceu', 'msl', 'yri', 'beb', 'fin', 'khv', 'cdx', 'lwk', 'acb', 'asw', 'ibs', 'gbr', 'pel', 'gih', 'chs', 'gwd']
1000 Genomes Project (1KG/TGP) subpops.
- gnomad.resources.grch38.gnomad.HGDP_POPS = ['japanese', 'papuanhighlands', 'papuansepik', 'adygei', 'orcadian', 'biaka', 'yakut', 'han', 'northernhan', 'uygur', 'miao', 'mongolian', 'balochi', 'bedouin', 'russian', 'daur', 'pima', 'hezhen', 'sindhi', 'yi', 'oroqen', 'san', 'tuscan', 'tu', 'palestinian', 'tujia', 'druze', 'pathan', 'basque', 'makrani', 'bergamoitalian', 'naxi', 'karitiana', 'sardinian', 'mbuti', 'mozabite', 'yoruba', 'lahu', 'dai', 'cambodian', 'bougainville', 'french', 'brahui', 'hazara', 'bantusouthafrica', 'surui', 'mandenka', 'kalash', 'xibo', 'colombian', 'bantukenya', 'she', 'burusho', 'maya']
Human Genome Diversity Project (HGDP) subpops.
- gnomad.resources.grch38.gnomad.TGP_POP_NAMES = {'acb': 'African Caribbean', 'asw': 'African-American', 'beb': 'Bengali', 'cdx': 'Chinese Dai', 'ceu': 'Utah Residents (European Ancestry)', 'chb': 'Han Chinese', 'chs': 'Southern Han Chinese', 'clm': 'Colombian', 'esn': 'Esan', 'fin': 'Finnish', 'gbr': 'British', 'gih': 'Gujarati', 'gwd': 'Gambian', 'ibs': 'Iberian', 'itu': 'Indian Telugu', 'jpt': 'Japanese', 'khv': 'Kinh', 'lwk': 'Luhya', 'msl': 'Mende', 'mxl': 'Mexican-American', 'pel': 'Peruvian', 'pjl': 'Punjabi', 'pur': 'Puerto Rican', 'stu': 'Sri Lankan Tamil', 'tsi': 'Toscani', 'yri': 'Yoruba'}
1000 Genomes Project (1KG/TGP) pop label map.
- gnomad.resources.grch38.gnomad.POPS_TO_REMOVE_FOR_POPMAX = {'v3': {'ami', 'asj', 'fin', 'mid', 'oth', 'remaining'}, 'v4': {'ami', 'asj', 'fin', 'oth', 'remaining'}}
Populations that are removed before popmax calculations.
- gnomad.resources.grch38.gnomad.DOWNSAMPLINGS = {'v3': [10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000, 15000, 20000, 25000, 30000, 40000, 50000, 60000, 70000, 75000, 80000, 85000, 90000, 95000, 100000, 110000, 120000], 'v4': [10, 100, 500, 1000, 2000, 5000, 10000, 20000, 50000, 100000, 200000, 500000]}
List of the downsampling numbers to use for frequency calculations by version.
- gnomad.resources.grch38.gnomad.public_release(data_type)[source]
Retrieve publicly released versioned table resource.
- Parameters:
data_type (
str
) – One of “exomes”, “genomes” or “joint”.- Return type:
- Returns:
Release Table
- gnomad.resources.grch38.gnomad.coverage(data_type)[source]
Retrieve gnomAD’s coverage table by data_type.
- Parameters:
data_type (
str
) – One of “exomes” or “genomes”- Return type:
- Returns:
Coverage Table
- gnomad.resources.grch38.gnomad.all_sites_an(data_type)[source]
Retrieve gnomAD’s all sites allele number table by data_type.
- Parameters:
data_type (
str
) – One of “exomes” or “genomes”- Return type:
- Returns:
All sites allele number VersionedTableResource
- gnomad.resources.grch38.gnomad.coverage_tsv_path(data_type, version=None)[source]
Retrieve gnomAD’s coverage table by data_type.
- Parameters:
data_type (
str
) – One of “exomes” or “genomes”version (
Optional
[str
]) –
- Return type:
str
- Returns:
Coverage Table
- gnomad.resources.grch38.gnomad.release_vcf_path(data_type, version, contig)[source]
Publically released VCF. Provide specific contig, i.e. “chr20”, to retrieve contig specific VCF.
- Parameters:
data_type (
str
) – One of “exomes” or “genomes”version (
str
) – One of the release versions of gnomAD on GRCh37contig (
str
) – Single contig “chr1” to “chrY”
- Return type:
str
- Returns:
Path to VCF
- gnomad.resources.grch38.gnomad.add_grpMaxFAF95_v4(ht)[source]
Add a grpMaxFAF95 struct with ‘popmax’ and ‘popmax_population’.
Also includes a jointGrpMaxFAF95 annotation using the v4 fafmax and joint_fafmax structures.
- gnomad.resources.grch38.gnomad.gnomad_gks(locus_interval, version, data_type='genomes', by_ancestry_group=False, by_sex=False, vrs_only=False, custom_ht=None, skip_checkpoint=False, skip_coverage=False, custom_coverage_ht=None)[source]
Perform gnomad GKS annotations on a range of variants at once.
- Parameters:
locus_interval (
IntervalExpression
) – Hail IntervalExpression of locus<reference_genome>. e.g. hl.locus_interval(‘chr1’, 6424776, 6461367, reference_genome=”GRCh38”)version (
str
) – String of version of gnomAD release to use.data_type (
str
) – String of either “exomes” or “genomes” for the type of reads that are desired.by_ancestry_group (
bool
) – Boolean to pass to obtain frequency information for each cohort.by_sex (
bool
) – Boolean to pass to return frequency information for each cohort split by chromosomal sex.vrs_only (
bool
) – Boolean to pass for only VRS info to be returned (will not include allele frequency information).custom_ht (
Table
) – Table to use instead of what public_release() method would return for the version.skip_checkpoint (
bool
) – Bool to pass to skip checkpointing selected fields (checkpointing may be desirable for large datasets by reducing data copies across the cluster).skip_coverage (
bool
) – Bool to pass to skip adding coverage statistics.custom_coverage_ht (
Table
) – Custom table to use for coverage statistics instead of the release coverage table.
- Return type:
list
- Returns:
List of dictionaries containing VRS information (and freq info split by ancestry groups and sex if desired) for specified variant.