gnomad_qc.v5.annotations.compute_coverage

Script to compute coverage, allele number, and quality histograms on all gnomAD v5 genomes (AoU v8 + updated gnomAD v4).

usage: gnomad_qc.v5.annotations.compute_coverage.py [-h]
                                                    [--project-name {aou,gnomad}]
                                                    [--overwrite]
                                                    [--n-partitions N_PARTITIONS]
                                                    [--test-2-partitions | --test-chr22-chrx-chry]
                                                    [--write-group-membership-ht]
                                                    [--test]
                                                    [--write-aou-downsampling-ht]
                                                    [--compute-all-cov-release-stats-ht]
                                                    [--merge-gnomad-coverage]
                                                    [--export-coverage-release-files]
                                                    [--merge-gnomad-an]
                                                    [--export-an-release-files]
                                                    [--merge-qual-hists]

Named Arguments

--project-name

Possible choices: aou, gnomad

Project name. Determines environment where script will run.

Default: “aou”

--overwrite

Overwrite existing hail Tables.

Default: False

--n-partitions

Number of partitions to use for the output Table.

Default: 5000

--test-2-partitions

Whether to run a test using only the first 2 partitions of the VDS test dataset.

Default: False

--test-chr22-chrx-chry

Whether to run a test using only the chr22, chrX, and chrY chromosomes of the VDS test dataset.

Default: False

--write-aou-downsampling-ht

Write v5 downsampling HT.

Default: False

--compute-all-cov-release-stats-ht

Compute the all sites coverage, allele number, and quality histogram HT.

Default: False

--merge-qual-hists

Merge variant quality histograms from AoU v8 and gnomAD v4 genomes.

Default: False

Get gnomAD genomes group membership HT.

--write-group-membership-ht

Write group membership HT.

Default: False

--test

Write test group membership HT to test path.

Default: False

Compute coverage release stats HT.

--merge-gnomad-coverage

Subtract consent drop samples from v4 release HT to create gnomAD v5 genomes coverage HT.

Default: False

--export-coverage-release-files

Join and export AoU + gnomAD v4 coverage release HT and TSV file.

Default: False

Compute AN release stats HT.

--merge-gnomad-an

Subtract consent drop samples from v4 release HT to create gnomAD v5 genomes AN HT.

Default: False

--export-an-release-files

Exports joint AoU + gnomAD v4 AN release HT and TSV file.

Default: False

Module Functions

gnomad_qc.v5.annotations.compute_coverage.get_downsampling_ht(ht)

Get Table with downsampling groups for all samples.

gnomad_qc.v5.annotations.compute_coverage.get_group_membership_ht(...)

Get genomes group membership HT for all sites allele number stratification.

gnomad_qc.v5.annotations.compute_coverage.validate_vds(vds)

Validate VDS before densify.

gnomad_qc.v5.annotations.compute_coverage.compute_all_release_stats_per_ref_site(...)

Compute coverage, allele number, and quality histograms per reference site.

gnomad_qc.v5.annotations.compute_coverage.merge_gnomad_coverage_hts(...)

Subtract consent drop samples from gnomAD v4 genomes release HT to create gnomAD v5 genomes coverage HT.

gnomad_qc.v5.annotations.compute_coverage.join_aou_and_gnomad_coverage_ht(...)

Join AoU and gnomAD coverage HTs for release.

gnomad_qc.v5.annotations.compute_coverage.merge_gnomad_an_hts(...)

Subtract consent drop samples from gnomAD v4 genomes release HT to create gnomAD v5 genomes AN HT.

gnomad_qc.v5.annotations.compute_coverage.join_aou_and_gnomad_an_ht(...)

Join AoU and gnomAD AN HTs for release.

gnomad_qc.v5.annotations.compute_coverage.join_aou_and_gnomad_qual_hists_ht(...)

Join AoU and gnomAD qual hists HTs for release.

gnomad_qc.v5.annotations.compute_coverage.main(args)

Compute all sites coverage, allele number, and quality histograms for v5 genomes (AoU v8 + gnomAD v4).

gnomad_qc.v5.annotations.compute_coverage.get_script_argument_parser()

Get script argument parser.

Script to compute coverage, allele number, and quality histograms on all gnomAD v5 genomes (AoU v8 + updated gnomAD v4).

gnomad_qc.v5.annotations.compute_coverage.get_downsampling_ht(ht)[source]

Get Table with downsampling groups for all samples.

v5 downsampling is only applied to the AoU dataset. Desired groups: - 10,000 - 100,000 - Genetic ancestry group sizes for AFR, AMR, NFE Note that the only desired genetic ancestry group sizes are AFR, AMR, and NFE, but code will also generate downsamplings for all other groups.

Parameters:

ht (Table) – Input Table.

Return type:

Table

Returns:

Table with downsampling groups.

gnomad_qc.v5.annotations.compute_coverage.get_group_membership_ht(meta_ht, project, ds_ht=None)[source]

Get genomes group membership HT for all sites allele number stratification.

Parameters:
  • meta_ht (Table) – Meta HT.

  • project (str) – Project name. Must be “aou” or “gnomad”. If “gnomad”, function will filter meta HT to only consent drop samples.

  • ds_ht (Optional[Table]) – Optional downsampling HT. Only used for AoU.

Return type:

Table

Returns:

Group membership HT.

gnomad_qc.v5.annotations.compute_coverage.validate_vds(vds)[source]

Validate VDS before densify.

Code is taken from https://github.com/hail-is/hail/blob/858f3ab30c2bcc46d6e57fdbfe408284b4b3de53/hail/python/hail/vds/variant_dataset.py#L271 at suggestion from Chris Vittal.

Parameters:

vds (VariantDataset) – Input VDS.

Return type:

None

Returns:

None; raises ValueError if VDS is not valid.

gnomad_qc.v5.annotations.compute_coverage.compute_all_release_stats_per_ref_site(vds, ref_ht, sex_karyotype_field, project, coverage_over_x_bins=[1, 5, 10, 15, 20, 25, 30, 50, 100], interval_ht=None, group_membership_ht=None)[source]

Compute coverage, allele number, and quality histograms per reference site.

Note

Running this function prior to calculating frequencies removes the need for an additional densify for frequency calculations.

Parameters:
  • vds (VariantDataset) – Input VDS.

  • ref_ht (Table) – Reference HT.

  • sex_karyotype_field (str) – Field name for sex karyotype.

  • project (str) – Project name.

  • coverage_over_x_bins (List[int]) – List of boundaries for computing samples over X depth.

  • interval_ht (Optional[Table]) – Interval HT.

  • group_membership_ht (Optional[Table]) – Group membership HT.

Return type:

Table

Returns:

HT with allele number and quality histograms per reference site.

gnomad_qc.v5.annotations.compute_coverage.merge_gnomad_coverage_hts(gnomad_ht, gnomad_release_ht, coverage_over_x_bins=[1, 5, 10, 15, 20, 25, 30, 50, 100], v4_count=76215, consent_drop_count=866)[source]

Subtract consent drop samples from gnomAD v4 genomes release HT to create gnomAD v5 genomes coverage HT.

Parameters:
  • gnomad_ht (Table) – gnomAD coverage HT (contains coverage for consent drop samples only).

  • gnomad_release_ht (Table) – gnomAD v4 genomes coverage release HT.

  • coverage_over_x_bins (List[int]) – List of boundaries for computing samples over X. Default is [1, 5, 10, 15, 20, 25, 30, 50, 100].

  • v4_count (int) – Number of release gnomAD v4 genome samples. Default is 76215.

  • consent_drop_count (int) – Number of consent drop gnomAD v4 genome samples. Default is 866.

Return type:

Table

Returns:

gnomAD v5 genomes coverage HT.

gnomad_qc.v5.annotations.compute_coverage.join_aou_and_gnomad_coverage_ht(aou_ht, gnomad_ht, coverage_over_x_bins=[1, 5, 10, 15, 20, 25, 30, 50, 100], gnomad_v5_count=75349)[source]

Join AoU and gnomAD coverage HTs for release.

Parameters:
  • aou_ht (Table) – AoU coverage HT.

  • gnomad_ht (Table) – gnomAD v5 genomes coverage HT.

  • coverage_over_x_bins (List[int]) – List of boundaries for computing samples over X. Default is [1, 5, 10, 15, 20, 25, 30, 50, 100].

  • gnomad_v5_count (int) – Number of release gnomAD v5 genome samples. Default is 76215 - 866.

Return type:

Table

Returns:

Joined HT.

gnomad_qc.v5.annotations.compute_coverage.merge_gnomad_an_hts(gnomad_ht, gnomad_release_ht)[source]

Subtract consent drop samples from gnomAD v4 genomes release HT to create gnomAD v5 genomes AN HT.

Parameters:
  • gnomad_ht (Table) – gnomAD AN HT (contains AN for consent drop samples only).

  • gnomad_release_ht (Table) – gnomAD v4 genomes release AN HT.

Return type:

Table

Returns:

gnomAD v5 genomes AN HT.

gnomad_qc.v5.annotations.compute_coverage.join_aou_and_gnomad_an_ht(aou_ht, gnomad_ht)[source]

Join AoU and gnomAD AN HTs for release.

Parameters:
  • aou_ht (Table) – AoU AN HT.

  • gnomad_ht (Table) – gnomAD v5 genomes AN HT.

Return type:

Table

Returns:

Joined HT.

gnomad_qc.v5.annotations.compute_coverage.join_aou_and_gnomad_qual_hists_ht(aou_ht, gnomad_ht)[source]

Join AoU and gnomAD qual hists HTs for release.

Note

We did not compute qual hists for the gnomAD v4 genomes release (https://github.com/broadinstitute/gnomad_qc/blob/e65bdbb5768113c0129199a875d845da245690e2/gnomad_qc/v4/annotations/generate_freq_genomes.py#L1139). This means we will not also not recompute hists on the gnomAD v4 genomes for v5, which also means we will not subtract values from the samples to drop for consent reasons.

Parameters:
  • aou_ht (Table) – AoU qual hists HT.

  • gnomad_ht (Table) – gnomAD qual hists HT.

Return type:

Table

Returns:

Joined HT.

gnomad_qc.v5.annotations.compute_coverage.main(args)[source]

Compute all sites coverage, allele number, and quality histograms for v5 genomes (AoU v8 + gnomAD v4).