gnomad_qc.v4.create_release.create_combined_faf_release_ht

Create a joint gnomAD v4 exome and genome frequency and FAF.

Generate a Hail Table containing frequencies for exomes and genomes in gnomAD v4, a joint frequency, a joint FAF, and the following tests comparing the two frequencies:

  • Hail’s contingency table test – chi-squared or Fisher’s exact test of independence depending on min cell count.

  • Cochran–Mantel–Haenszel test – stratified test of independence for 2x2xK contingency tables.

usage: gnomad_qc.v4.create_release.create_combined_faf_release_ht.py
       [-h] [--slack-channel SLACK_CHANNEL] [--overwrite] [--test-gene]
       [--test-y-gene] [--create-combined-frequency-table]
       [--skip-apply-release-filters]
       [--stats-chr {chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX,chrY}]
       [--stats-combine-all-chr] [--perform-contingency-table-test]
       [--min-cell-count MIN_CELL_COUNT]
       [--perform-cochran-mantel-haenszel-test]
       [--finalize-combined-faf-release] [--n-partitions N_PARTITIONS]

Named Arguments

--slack-channel

Slack channel to post results and notifications to.

--overwrite

Overwrite output files.

Default: False

--test-gene

Filter Tables to only the PCSK9 gene for testing.

Default: False

--test-y-gene

Test on a subset of variants in ZFY on chrY.

Default: False

--create-combined-frequency-table

Create a Table with frequency information for exomes, genomes, and the joint exome + genome frequencies. Included frequencies are adj, raw, and adj for all genetic ancestry groups found in both the exomes and genomes. The table also includes FAF computed on the joint frequencies.

Default: False

--skip-apply-release-filters

Whether to skip applying the final release filters to the Table.

Default: False

--stats-chr

Possible choices: chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY

Chromosome to compute stats on.

--stats-combine-all-chr

Whether to combined all chromosome stats. The stats calculations must have been performed for each chromosome using the –stats-chr argument.

Default: False

--perform-contingency-table-test

Perform chi-squared or Fisher’s exact test of independence on the allele frequencies based on min_cell_count.

Default: False

--min-cell-count

Minimum count in every cell to use the chi-squared test.

Default: 5

--perform-cochran-mantel-haenszel-test

Perform the Cochran–Mantel–Haenszel test, a stratified test of independence for 2x2xK contingency tables, on the allele frequencies where K is the number of genetic ancestry groups with FAF computed.

Default: False

--finalize-combined-faf-release

Finalize the combined FAF Table for release.

Default: False

Create finalized combined FAF release Table.

Arguments for finalizing the combined FAF release Table.

--n-partitions

Number of partitions to repartition the finalized combined FAF release Table to.

Default: 10000

Module Functions

gnomad_qc.v4.create_release.create_combined_faf_release_ht.CHR_LIST

List of chromosomes in the combined FAF release.

gnomad_qc.v4.create_release.create_combined_faf_release_ht.filter_gene_to_test(ht, ...)

Filter to PCSK9 1:55039447-55064852 and/or ZFY Y:2935281-2982506 for testing.

gnomad_qc.v4.create_release.create_combined_faf_release_ht.extract_freq_info(ht, ...)

Extract frequencies and FAF for adj, raw (only for frequencies), adj by pop, adj by sex, and adj by pop/sex.

gnomad_qc.v4.create_release.create_combined_faf_release_ht.add_all_sites_an_and_qual_hists(ht, ...)

Add all sites AN and qual hists to the Table.

gnomad_qc.v4.create_release.create_combined_faf_release_ht.get_joint_freq_and_faf(...)

Get joint genomes and exomes frequency and FAF information.

gnomad_qc.v4.create_release.create_combined_faf_release_ht.perform_contingency_table_test(...)

Perform Hail's contingency_table_test on the alleles counts between two frequency expressions.

gnomad_qc.v4.create_release.create_combined_faf_release_ht.perform_cmh_test(ht, ...)

Perform the Cochran–Mantel–Haenszel test on the alleles counts between two frequency expressions using genetic ancestry group as the stratification.

gnomad_qc.v4.create_release.create_combined_faf_release_ht.create_final_combined_faf_release(ht, ...)

Create the final combined FAF release Table.

gnomad_qc.v4.create_release.create_combined_faf_release_ht.get_combine_faf_resources([...])

Get PipelineResourceCollection for all resources needed in the combined FAF resource creation pipeline.

gnomad_qc.v4.create_release.create_combined_faf_release_ht.main(args)

Create combined FAF resource.

gnomad_qc.v4.create_release.create_combined_faf_release_ht.get_script_argument_parser()

Get script argument parser.

Create a joint gnomAD v4 exome and genome frequency and FAF.

Generate a Hail Table containing frequencies for exomes and genomes in gnomAD v4, a joint frequency, a joint FAF, and the following tests comparing the two frequencies:

  • Hail’s contingency table test – chi-squared or Fisher’s exact test of independence depending on min cell count.

  • Cochran–Mantel–Haenszel test – stratified test of independence for 2x2xK contingency tables.

gnomad_qc.v4.create_release.create_combined_faf_release_ht.CHR_LIST = ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrX', 'chrY']

List of chromosomes in the combined FAF release.

gnomad_qc.v4.create_release.create_combined_faf_release_ht.filter_gene_to_test(ht, pcsk9, zfy)[source]

Filter to PCSK9 1:55039447-55064852 and/or ZFY Y:2935281-2982506 for testing.

Parameters:
  • ht (Table) – Table with frequency and FAF information.

  • pcsk9 (bool) – Whether to filter to PCSK9 1:55039447-55064852.

  • zfy (bool) – Whether to filter to ZFY Y:2935281-2982506.

Return type:

Table

Returns:

Table with frequency and FAF information of the filtered interval of a gene

gnomad_qc.v4.create_release.create_combined_faf_release_ht.extract_freq_info(ht, prefix, apply_release_filters=True)[source]

Extract frequencies and FAF for adj, raw (only for frequencies), adj by pop, adj by sex, and adj by pop/sex.

The following annotations are renamed and where applicable, filtered:
  • freq: {prefix}_freq

  • faf: {prefix}_faf

  • grpmax: {prefix}_grpmax

  • fafmax: {prefix}_fafmax

  • qual_hists: {prefix}_qual_hists

  • raw_qual_hists: {prefix}_raw_qual_hists

  • age_hists: {prefix}_age_hists

The following global annotations are filtered and renamed:
  • freq_meta: {prefix}_freq_meta

  • freq_index_dict: {prefix}_freq_index_dict

  • faf_meta: {prefix}_faf_meta

  • faf_index_dict: {prefix}_faf_index_dict

  • age_distribution: {prefix}_age_distribution

If apply_release_filters is True, a {prefix}_filters annotation is added to the Table and the following variants are filtered:
  • chrM

  • AS_lowqual sites (these sites are dropped in the final_filters HT so will not have information in filters, hl.is_defined(ht.filters) is used)

  • AC_raw == 0

Parameters:
  • ht (Table) – Table with frequency and FAF information.

  • prefix (str) – Prefix to add to each of the filtered annotations.

  • apply_release_filters (bool) – Whether to apply the final release filters to the Table. Default is True.

Return type:

Table

Returns:

Table with filtered frequency and FAF information.

gnomad_qc.v4.create_release.create_combined_faf_release_ht.add_all_sites_an_and_qual_hists(ht, exomes_all_sites_ht, genomes_all_sites_ht)[source]

Add all sites AN and qual hists to the Table.

Parameters:
  • ht (Table) – Table with frequency and FAF information.

  • exomes_all_sites_ht (Table) – Table with all sites AN and qual hists for exomes.

  • genomes_all_sites_ht (Table) – Table with all sites AN and qual hists for genomes.

Return type:

Table

Returns:

Table with all sites AN and qual hists added.

gnomad_qc.v4.create_release.create_combined_faf_release_ht.get_joint_freq_and_faf(genomes_ht, exomes_ht, genomes_all_sites_ht, exomes_all_sites_ht, faf_pops_to_exclude={'ami', 'asj', 'fin', 'oth', 'remaining'})[source]

Get joint genomes and exomes frequency and FAF information.

Parameters:
  • genomes_ht (Table) – Table with genomes frequency and FAF information.

  • exomes_ht (Table) – Table with exomes frequency and FAF information.

  • genomes_all_sites_ht (Table) – Table with all sites AN and qual hists for genomes.

  • exomes_all_sites_ht (Table) – Table with all sites AN and qual hists for exomes.

  • faf_pops_to_exclude (Set[str]) – Set of genetic ancestry groups to exclude from the FAF calculation.

Return type:

Table

Returns:

Table with joint genomes and exomes frequency and FAF information.

gnomad_qc.v4.create_release.create_combined_faf_release_ht.perform_contingency_table_test(freq1_expr, freq2_expr, freq1_meta_expr, freq2_meta_expr, joint_meta_expr, min_cell_count=5)[source]

Perform Hail’s contingency_table_test on the alleles counts between two frequency expressions.

This is done on the 2x2 matrix of reference and alternate allele counts. The chi-squared test is used for any case where all cells of the 2x2 matrix are greater than min_cell_count. Otherwise, Fisher’s exact test is used.

freq1_expr and freq2_expr should be ArrayExpressions of structs with ‘AN’ and ‘AC’ annotations.

Note

The order of the output array expression will be the same as joint_meta_expr and any frequency group with missing or zero AC in both freq1_expr and freq2_expr (based on freq1_meta_expr and freq2_meta_expr) will be set to missing. Any frequency group in freq1_meta_expr or freq2_meta_expr that is not in joint_meta_expr will be excluded from tests.

Parameters:
  • freq1_expr (ArrayExpression) – First ArrayExpression of frequencies to combine.

  • freq2_expr (ArrayExpression) – Second ArrayExpression of frequencies to combine.

  • freq1_meta_expr (ArrayExpression) – Frequency metadata for freq1_expr.

  • freq2_meta_expr (ArrayExpression) – Frequency metadata for freq2_expr.

  • joint_meta_expr (ArrayExpression) – Joint frequency metadata, only used for ordering the output array expression.

  • min_cell_count (int) – Minimum count in every cell to use the chi-squared test. Default is 5.

Return type:

ArrayExpression

Returns:

ArrayExpression for contingency table test results.

gnomad_qc.v4.create_release.create_combined_faf_release_ht.perform_cmh_test(ht, freq1_expr, freq2_expr, freq1_meta_expr, freq2_meta_expr, pops)[source]

Perform the Cochran–Mantel–Haenszel test on the alleles counts between two frequency expressions using genetic ancestry group as the stratification.

This is done by creating a list of 2x2 matrices of freq1/freq2 reference and alternate allele counts for each genetic ancestry group in pops. The stats used in perform_contingency_table_test can only be used on 2x2 matrices, so we perform that per genetic ancestry group to get one statistic per genetic ancestry group. The CMH test allows for multiple 2x2 matrices for a specific stratification, giving a single statistic across all genetic ancestry groups.

freq1_expr and freq2_expr should be ArrayExpressions of structs with ‘AN’ and ‘AC’ annotations.

Note

Any genetic ancestry group with zero AC in both freq1_expr and freq2_expr will be excluded from the test.

Parameters:
  • ht (Table) – Table with joint exomes and genomes frequency and FAF information.

  • freq1_expr (ArrayExpression) – First ArrayExpression of frequencies to combine.

  • freq2_expr (ArrayExpression) – Second ArrayExpression of frequencies to combine.

  • freq1_meta_expr (ArrayExpression) – Frequency metadata for freq1_expr.

  • freq2_meta_expr (ArrayExpression) – Frequency metadata for freq2_expr.

  • pops (List[str]) – List of genetic ancestry groups to include in the CMH test.

Return type:

Table

Returns:

ArrayExpression for Cochran–Mantel–Haenszel test results.

gnomad_qc.v4.create_release.create_combined_faf_release_ht.create_final_combined_faf_release(ht, contingency_table_ht, cmh_ht)[source]

Create the final combined FAF release Table.

Parameters:
  • ht – Table with joint exomes and genomes frequency and FAF information.

  • contingency_table_ht (Table) – Table with contingency table test results to include on the final Table.

  • cmh_ht (Table) – Table with Cochran–Mantel–Haenszel test results to include on the final Table.

Return type:

Table

Returns:

Table with final combined FAF release information.

gnomad_qc.v4.create_release.create_combined_faf_release_ht.get_combine_faf_resources(overwrite=False, test=False, filtered=True, stats_chr=None, stats_combine_all_chr=False)[source]

Get PipelineResourceCollection for all resources needed in the combined FAF resource creation pipeline.

Parameters:
  • overwrite (bool) – Whether to overwrite existing resources. Default is False.

  • test (bool) – Whether to use test resources. Default is False.

  • filtered (bool) – Whether to get the resources for the filtered Tables. Default is True.

  • stats_chr (str) – Chromosome to get temp stats resource for. Default is None, which will return the resources for the stats on the full exome/genome.

  • stats_combine_all_chr (bool) – Whether to also get the stats resources for all chromosomes to be combined. Default is False.

Return type:

PipelineResourceCollection

Returns:

PipelineResourceCollection containing resources for all steps of the combined FAF resource creation pipeline.

gnomad_qc.v4.create_release.create_combined_faf_release_ht.main(args)[source]

Create combined FAF resource.