gnomad_qc.v4.create_release.create_combined_faf_release_ht
Create a joint gnomAD v4 exome and genome frequency and FAF.
Generate a Hail Table containing frequencies for exomes and genomes in gnomAD v4, a joint frequency, a joint FAF, and the following tests comparing the two frequencies:
Hail’s contingency table test – chi-squared or Fisher’s exact test of independence depending on min cell count.
Cochran–Mantel–Haenszel test – stratified test of independence for 2x2xK contingency tables.
usage: gnomad_qc.v4.create_release.create_combined_faf_release_ht.py
[-h] [--slack-channel SLACK_CHANNEL] [--overwrite] [--test-gene]
[--test-y-gene] [--create-combined-frequency-table]
[--skip-apply-release-filters]
[--stats-chr {chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX,chrY}]
[--stats-combine-all-chr] [--perform-contingency-table-test]
[--min-cell-count MIN_CELL_COUNT]
[--perform-cochran-mantel-haenszel-test]
[--finalize-combined-faf-release] [--n-partitions N_PARTITIONS]
Named Arguments
- --slack-channel
Slack channel to post results and notifications to.
- --overwrite
Overwrite output files.
Default: False
- --test-gene
Filter Tables to only the PCSK9 gene for testing.
Default: False
- --test-y-gene
Test on a subset of variants in ZFY on chrY.
Default: False
- --create-combined-frequency-table
Create a Table with frequency information for exomes, genomes, and the joint exome + genome frequencies. Included frequencies are adj, raw, and adj for all genetic ancestry groups found in both the exomes and genomes. The table also includes FAF computed on the joint frequencies.
Default: False
- --skip-apply-release-filters
Whether to skip applying the final release filters to the Table.
Default: False
- --stats-chr
Possible choices: chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY
Chromosome to compute stats on.
- --stats-combine-all-chr
Whether to combined all chromosome stats. The stats calculations must have been performed for each chromosome using the –stats-chr argument.
Default: False
- --perform-contingency-table-test
Perform chi-squared or Fisher’s exact test of independence on the allele frequencies based on min_cell_count.
Default: False
- --min-cell-count
Minimum count in every cell to use the chi-squared test.
Default: 5
- --perform-cochran-mantel-haenszel-test
Perform the Cochran–Mantel–Haenszel test, a stratified test of independence for 2x2xK contingency tables, on the allele frequencies where K is the number of genetic ancestry groups with FAF computed.
Default: False
- --finalize-combined-faf-release
Finalize the combined FAF Table for release.
Default: False
Create finalized combined FAF release Table.
Arguments for finalizing the combined FAF release Table.
- --n-partitions
Number of partitions to repartition the finalized combined FAF release Table to.
Default: 10000
Module Functions
|
List of chromosomes in the combined FAF release. |
|
Filter to PCSK9 1:55039447-55064852 and/or ZFY Y:2935281-2982506 for testing. |
|
Extract frequencies and FAF for adj, raw (only for frequencies), adj by pop, adj by sex, and adj by pop/sex. |
|
Add all sites AN and qual hists to the Table. |
|
Get joint genomes and exomes frequency and FAF information. |
|
Perform Hail's contingency_table_test on the alleles counts between two frequency expressions. |
|
Perform the Cochran–Mantel–Haenszel test on the alleles counts between two frequency expressions using genetic ancestry group as the stratification. |
|
Create the final combined FAF release Table. |
|
Get PipelineResourceCollection for all resources needed in the combined FAF resource creation pipeline. |
|
Create combined FAF resource. |
|
Get script argument parser. |
Create a joint gnomAD v4 exome and genome frequency and FAF.
Generate a Hail Table containing frequencies for exomes and genomes in gnomAD v4, a joint frequency, a joint FAF, and the following tests comparing the two frequencies:
Hail’s contingency table test – chi-squared or Fisher’s exact test of independence depending on min cell count.
Cochran–Mantel–Haenszel test – stratified test of independence for 2x2xK contingency tables.
- gnomad_qc.v4.create_release.create_combined_faf_release_ht.CHR_LIST = ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrX', 'chrY']
List of chromosomes in the combined FAF release.
- gnomad_qc.v4.create_release.create_combined_faf_release_ht.filter_gene_to_test(ht, pcsk9, zfy)[source]
Filter to PCSK9 1:55039447-55064852 and/or ZFY Y:2935281-2982506 for testing.
- gnomad_qc.v4.create_release.create_combined_faf_release_ht.extract_freq_info(ht, prefix, apply_release_filters=True)[source]
Extract frequencies and FAF for adj, raw (only for frequencies), adj by pop, adj by sex, and adj by pop/sex.
- The following annotations are renamed and where applicable, filtered:
freq: {prefix}_freq
faf: {prefix}_faf
grpmax: {prefix}_grpmax
fafmax: {prefix}_fafmax
qual_hists: {prefix}_qual_hists
raw_qual_hists: {prefix}_raw_qual_hists
age_hists: {prefix}_age_hists
- The following global annotations are filtered and renamed:
freq_meta: {prefix}_freq_meta
freq_index_dict: {prefix}_freq_index_dict
faf_meta: {prefix}_faf_meta
faf_index_dict: {prefix}_faf_index_dict
age_distribution: {prefix}_age_distribution
- If apply_release_filters is True, a {prefix}_filters annotation is added to the Table and the following variants are filtered:
chrM
AS_lowqual sites (these sites are dropped in the final_filters HT so will not have information in filters, hl.is_defined(ht.filters) is used)
AC_raw == 0
- Parameters:
ht (
Table
) – Table with frequency and FAF information.prefix (
str
) – Prefix to add to each of the filtered annotations.apply_release_filters (
bool
) – Whether to apply the final release filters to the Table. Default is True.
- Return type:
- Returns:
Table with filtered frequency and FAF information.
- gnomad_qc.v4.create_release.create_combined_faf_release_ht.add_all_sites_an_and_qual_hists(ht, exomes_all_sites_ht, genomes_all_sites_ht)[source]
Add all sites AN and qual hists to the Table.
- Parameters:
- Return type:
- Returns:
Table with all sites AN and qual hists added.
- gnomad_qc.v4.create_release.create_combined_faf_release_ht.get_joint_freq_and_faf(genomes_ht, exomes_ht, genomes_all_sites_ht, exomes_all_sites_ht, faf_pops_to_exclude={'ami', 'asj', 'fin', 'oth', 'remaining'})[source]
Get joint genomes and exomes frequency and FAF information.
- Parameters:
genomes_ht (
Table
) – Table with genomes frequency and FAF information.exomes_ht (
Table
) – Table with exomes frequency and FAF information.genomes_all_sites_ht (
Table
) – Table with all sites AN and qual hists for genomes.exomes_all_sites_ht (
Table
) – Table with all sites AN and qual hists for exomes.faf_pops_to_exclude (
Set
[str
]) – Set of genetic ancestry groups to exclude from the FAF calculation.
- Return type:
- Returns:
Table with joint genomes and exomes frequency and FAF information.
- gnomad_qc.v4.create_release.create_combined_faf_release_ht.perform_contingency_table_test(freq1_expr, freq2_expr, freq1_meta_expr, freq2_meta_expr, joint_meta_expr, min_cell_count=5)[source]
Perform Hail’s contingency_table_test on the alleles counts between two frequency expressions.
This is done on the 2x2 matrix of reference and alternate allele counts. The chi-squared test is used for any case where all cells of the 2x2 matrix are greater than min_cell_count. Otherwise, Fisher’s exact test is used.
freq1_expr and freq2_expr should be ArrayExpressions of structs with ‘AN’ and ‘AC’ annotations.
Note
The order of the output array expression will be the same as joint_meta_expr and any frequency group with missing or zero AC in both freq1_expr and freq2_expr (based on freq1_meta_expr and freq2_meta_expr) will be set to missing. Any frequency group in freq1_meta_expr or freq2_meta_expr that is not in joint_meta_expr will be excluded from tests.
- Parameters:
freq1_expr (
ArrayExpression
) – First ArrayExpression of frequencies to combine.freq2_expr (
ArrayExpression
) – Second ArrayExpression of frequencies to combine.freq1_meta_expr (
ArrayExpression
) – Frequency metadata for freq1_expr.freq2_meta_expr (
ArrayExpression
) – Frequency metadata for freq2_expr.joint_meta_expr (
ArrayExpression
) – Joint frequency metadata, only used for ordering the output array expression.min_cell_count (
int
) – Minimum count in every cell to use the chi-squared test. Default is 5.
- Return type:
- Returns:
ArrayExpression for contingency table test results.
- gnomad_qc.v4.create_release.create_combined_faf_release_ht.perform_cmh_test(ht, freq1_expr, freq2_expr, freq1_meta_expr, freq2_meta_expr, pops)[source]
Perform the Cochran–Mantel–Haenszel test on the alleles counts between two frequency expressions using genetic ancestry group as the stratification.
This is done by creating a list of 2x2 matrices of freq1/freq2 reference and alternate allele counts for each genetic ancestry group in pops. The stats used in perform_contingency_table_test can only be used on 2x2 matrices, so we perform that per genetic ancestry group to get one statistic per genetic ancestry group. The CMH test allows for multiple 2x2 matrices for a specific stratification, giving a single statistic across all genetic ancestry groups.
freq1_expr and freq2_expr should be ArrayExpressions of structs with ‘AN’ and ‘AC’ annotations.
Note
Any genetic ancestry group with zero AC in both freq1_expr and freq2_expr will be excluded from the test.
- Parameters:
ht (
Table
) – Table with joint exomes and genomes frequency and FAF information.freq1_expr (
ArrayExpression
) – First ArrayExpression of frequencies to combine.freq2_expr (
ArrayExpression
) – Second ArrayExpression of frequencies to combine.freq1_meta_expr (
ArrayExpression
) – Frequency metadata for freq1_expr.freq2_meta_expr (
ArrayExpression
) – Frequency metadata for freq2_expr.pops (
List
[str
]) – List of genetic ancestry groups to include in the CMH test.
- Return type:
- Returns:
ArrayExpression for Cochran–Mantel–Haenszel test results.
- gnomad_qc.v4.create_release.create_combined_faf_release_ht.create_final_combined_faf_release(ht, contingency_table_ht, cmh_ht)[source]
Create the final combined FAF release Table.
- Parameters:
- Return type:
- Returns:
Table with final combined FAF release information.
- gnomad_qc.v4.create_release.create_combined_faf_release_ht.get_combine_faf_resources(overwrite=False, test=False, filtered=True, stats_chr=None, stats_combine_all_chr=False)[source]
Get PipelineResourceCollection for all resources needed in the combined FAF resource creation pipeline.
- Parameters:
overwrite (
bool
) – Whether to overwrite existing resources. Default is False.test (
bool
) – Whether to use test resources. Default is False.filtered (
bool
) – Whether to get the resources for the filtered Tables. Default is True.stats_chr (
str
) – Chromosome to get temp stats resource for. Default is None, which will return the resources for the stats on the full exome/genome.stats_combine_all_chr (
bool
) – Whether to also get the stats resources for all chromosomes to be combined. Default is False.
- Return type:
PipelineResourceCollection
- Returns:
PipelineResourceCollection containing resources for all steps of the combined FAF resource creation pipeline.