gnomad_qc.v4.resources.annotations
Script containing annotation related resources.
Module Functions
Get the gnomAD v4 info VersionedTableResource. |
|
Get the gnomAD v4 VEP annotation VersionedTableResource. |
|
Get the gnomAD v4 VEP annotation VersionedTableResource for validation counts. |
|
Get the gnomAD v4 trio stats VersionedTableResource. |
|
Get the gnomAD v4 sibling stats VersionedTableResource. |
|
|
Return the VersionedTableResource to the RF-ready annotated Table. |
Path to sites VCF (input information for running VQSR). |
|
|
Provide the path to the transmitted singleton VCF used as input to VQSR. |
Get the downsampling annotation table. |
|
Get the frequency annotation table for a specified release. |
|
|
Get the all sites AN and qual hists TableResource. |
|
Get the combined v4 genome and exome frequency annotation VersionedTableResource. |
|
Get VersionedTableResource for a frequency comparison between v4 genomes and exomes. |
|
Get the path to the in silico predictors TableResource for a specified release. |
Get the gnomAD v4 VersionedTableResource containing VRS annotations. |
|
|
Get the HGDP + 1KG/TGP subset updated call stats TableResource. |
Get the gnomAD v4 split VDS. |
Script containing annotation related resources.
- gnomad_qc.v4.resources.annotations.get_info(split=True, test=False)[source]
Get the gnomAD v4 info VersionedTableResource.
- Parameters:
split (
bool
) – Whether to return the split or multi-allelic version of the resource.test (
bool
) – Whether to use a tmp path for analysis of the test VDS instead of the full v4 VDS.
- Return type:
VersionedTableResource
- Returns:
gnomAD v4 info VersionedTableResource.
- gnomad_qc.v4.resources.annotations.get_vep(test=False, data_type='exomes')[source]
Get the gnomAD v4 VEP annotation VersionedTableResource.
- Parameters:
test (
bool
) – Whether to use a tmp path for analysis of the test Table instead of the full v4 Table.data_type (
str
) – Data type of annotation resource. e.g. “exomes” or “genomes”. Default is “exomes”.
- Return type:
VersionedTableResource
- Returns:
gnomAD v4 VEP VersionedTableResource.
- gnomad_qc.v4.resources.annotations.validate_vep_path(test=False, data_type='exomes')[source]
Get the gnomAD v4 VEP annotation VersionedTableResource for validation counts.
- Parameters:
test (
bool
) – Whether to use a tmp path for analysis of the test VDS instead of the full v4 VDS.data_type (
str
) – Data type of annotation resource. e.g. “exomes” or “genomes”. Default is “exomes”.
- Return type:
VersionedTableResource
- Returns:
gnomAD v4 VEP VersionedTableResource containing validity check.
- gnomad_qc.v4.resources.annotations.get_trio_stats(test=False, releasable_only=False)[source]
Get the gnomAD v4 trio stats VersionedTableResource.
- Parameters:
test (
bool
) – Whether to use a temporary path for testing.releasable_only (
bool
) – Whether to use only releasable data.
- Return type:
VersionedTableResource
- Returns:
gnomAD v4 trio stats VersionedTableResource.
- gnomad_qc.v4.resources.annotations.get_sib_stats(test=False)[source]
Get the gnomAD v4 sibling stats VersionedTableResource.
- Parameters:
test (
bool
) – Whether to use a tmp path for testing.- Return type:
VersionedTableResource
- Returns:
gnomAD v4 sibling stats VersionedTableResource.
- gnomad_qc.v4.resources.annotations.get_variant_qc_annotations(test=False)[source]
Return the VersionedTableResource to the RF-ready annotated Table.
Annotations that are included in the Table:
- Features for RF:
variant_type
allele_type
n_alt_alleles
has_star
AS_QD
AS_pab_max
AS_MQRankSum
AS_SOR
AS_ReadPosRankSum
- Training sites (bool):
transmitted_singleton
sibling_singleton
fail_hard_filters - (ht.QD < 2) | (ht.FS > 60) | (ht.MQ < 30)
- Parameters:
test (
bool
) – Whether to use a tmp path for testing.- Return type:
VersionedTableResource
- Returns:
Table with variant QC annotations.
- gnomad_qc.v4.resources.annotations.info_vcf_path(info_method='AS', version='4.0', split=False, test=False)[source]
Path to sites VCF (input information for running VQSR).
- Parameters:
info_method (
str
) – Method for generating info VCF. Must be one of “AS”, “quasi”, or “set_long_AS_missing”. Default is “AS”.version (
str
) – Version of annotation path to return.split (
bool
) – Whether to return the split or multi-allelic version of the resource.test (
bool
) – Whether to use a tmp path for analysis of the test VDS instead of the full v4 VDS.
- Return type:
str
- Returns:
String for the path to the info VCF.
- gnomad_qc.v4.resources.annotations.get_true_positive_vcf_path(version='4.0', test=False, adj=False, true_positive_type='transmitted_singleton')[source]
Provide the path to the transmitted singleton VCF used as input to VQSR.
- Parameters:
version (
str
) – Version of true positive VCF path to return.test (
bool
) – Whether to use a tmp path for testing.adj (
bool
) – Whether to use adj genotypes.true_positive_type (
str
) – Type of true positive VCF path to return. Should be one of “transmitted_singleton”, “sibling_singleton”, or “transmitted_singleton.sibling_singleton”. Default is “transmitted_singleton”.
- Return type:
str
- Returns:
String for the path to the true positive VCF.
- gnomad_qc.v4.resources.annotations.get_downsampling(test=False, subset=None)[source]
Get the downsampling annotation table.
- Parameters:
test (
bool
) – Whether to use a tmp path for tests. Default is False.subset (
Optional
[str
]) – Optional subset to return downsampling Table for. Downsampling for entire dataset will be returned if not specified.
- Return type:
VersionedTableResource
- Returns:
Hail Table containing subset or overall dataset downsampling annotations.
- gnomad_qc.v4.resources.annotations.get_freq(version=None, data_type='exomes', test=False, hom_alt_adjusted=False, chrom=None, intermediate_subset=None, finalized=True)[source]
Get the frequency annotation table for a specified release.
- Parameters:
version (
str
) – Version of annotation path to return.data_type (
str
) – Data type of annotation resource. e.g. “exomes” or “genomes”.test (
bool
) – Whether to use a tmp path for tests.hom_alt_adjusted – Whether to return the hom alt adjusted frequency table.
chrom (
Optional
[str
]) – Chromosome to return frequency table for. Entire Table will be returned if not specified.intermediate_subset (
Optional
[str
]) – Optional intermediate subset to return temp frequency Table for. Entire Table will be returned if not specified.finalized (
bool
) – Whether to return the finalized frequency table. Default is True.
- Return type:
TableResource
- Returns:
Hail Table containing subset or overall cohort frequency annotations.
- gnomad_qc.v4.resources.annotations.get_all_sites_an_and_qual_hists(data_type='exomes', test=False)[source]
Get the all sites AN and qual hists TableResource.
- Parameters:
data_type (
str
) – ‘exomes’ or ‘genomes’. Default is ‘exomes’.test (
bool
) – Whether to use a tmp path for testing.
- Return type:
VersionedTableResource
- Returns:
Hail Table containing all sites AN and qual hists annotations.
- gnomad_qc.v4.resources.annotations.get_combined_frequency(test=False, filtered=True)[source]
Get the combined v4 genome and exome frequency annotation VersionedTableResource.
- Parameters:
test (
bool
) – Whether to use a tmp path for testing.filtered (
bool
) – Whether to return the resource for the filtered combined frequency. Default is True.
- Return type:
VersionedTableResource
- Returns:
Hail Table containing combined frequency annotations.
- gnomad_qc.v4.resources.annotations.get_freq_comparison(method, test=False, filtered=True)[source]
Get VersionedTableResource for a frequency comparison between v4 genomes and exomes.
- Table contains results from one of the following comparison methods:
- ‘contingency_table_test’: Hail’s contingency table test – chi-squared or
Fisher’s exact test of independence depending on min allele count.
- ‘cmh_test’: Cochran–Mantel–Haenszel test – stratified test of independence
for 2x2xK contingency tables.
- Parameters:
method (
str
) – Method used to compare frequencies between v4 genomes and exomes. Can be one of contingency_table_test or cmh_test.test (
bool
) – Whether to use a tmp path for testing. Default is False.filtered (
bool
) – Whether to return the filtered frequency comparison Table. Default is True.
- Return type:
VersionedTableResource
- Returns:
VersionedTableResource containing results from the specified comparison method.
- gnomad_qc.v4.resources.annotations.get_insilico_predictors(predictor='cadd')[source]
Get the path to the in silico predictors TableResource for a specified release.
- Parameters:
predictor (
str
) – One of the in silico predictors available in gnomAD v4, including cadd, revel, primate_ai, splice_ai, and pangolin.- Return type:
VersionedTableResource
- Returns:
in silico predictor VersionedTableResource for gnomAD v4.
- gnomad_qc.v4.resources.annotations.get_vrs(original_annotations=False, test=False, data_type='exomes')[source]
Get the gnomAD v4 VersionedTableResource containing VRS annotations.
- Parameters:
original_annotations (
bool
) – Whether to obtain the original input Table with all its annotations in addition to the added on VRS annotations. If set to False, obtain a Table with only the VRS annotations.test (
bool
) – Whether to use a tmp path for analysis of the test Table instead of the full v4 Table.data_type (
str
) – Data type of annotation resource. e.g. “exomes” or “genomes”. Default is “exomes”.
- Return type:
VersionedTableResource
- Returns:
gnomAD v4 VRS VersionedTableResource.
- gnomad_qc.v4.resources.annotations.hgdp_tgp_updated_callstats(subset, test=False)[source]
Get the HGDP + 1KG/TGP subset updated call stats TableResource.
- Parameters:
subset (
str
) – The subset of the HGDP + 1KG/TGP release to return, must be “added”, “subtracted”, “pop_diff”, “join”, “v3_release_an”, “v3_pop_diff_an”, or “pre_validity_check”.test (
bool
) – Whether to return the annotation resource for testing purposes.
- Return type:
VersionedTableResource
- Returns:
MatrixTableResource for specified subset.
- gnomad_qc.v4.resources.annotations.get_split_vds(version=None, data_type='exomes', test=False)[source]
Get the gnomAD v4 split VDS.
This is a temporary resource that will be removed once the split VDS is no longer needed. Given the uncertainies around frequency calculation runtimes, we cannot store it in gnomad-tmp but this needs to be deleted once frequency work is complete.
- Parameters:
version (
str
) – Version of annotation path to return.data_type (
str
) – Data type of annotation resource. e.g. “exomes” or “genomes”. Default is “exomes”.test (
bool
) – Whether to use a tmp path for analysis of the test Table instead of the full v4 Table.
- Return type:
VariantDatasetResource
- Returns:
gnomAD v4 VariantDatasetResource.