gnomad_qc.v5.annotations.compute_coverage
Script to compute coverage, allele number, and quality histograms on all gnomAD v5 genomes (AoU v8 + updated gnomAD v4).
usage: gnomad_qc.v5.annotations.compute_coverage.py [-h]
[--project-name {aou,gnomad}]
[--overwrite]
[--n-partitions N_PARTITIONS]
[--test-2-partitions | --test-chr22-chrx-chry]
[--write-group-membership-ht]
[--test]
[--write-aou-downsampling-ht]
[--compute-all-cov-release-stats-ht]
[--merge-gnomad-coverage]
[--export-coverage-release-files]
[--merge-gnomad-an]
[--export-an-release-files]
[--merge-qual-hists]
Named Arguments
- --project-name
Possible choices: aou, gnomad
Project name. Determines environment where script will run.
Default: “aou”
- --overwrite
Overwrite existing hail Tables.
Default: False
- --n-partitions
Number of partitions to use for the output Table.
Default: 5000
- --test-2-partitions
Whether to run a test using only the first 2 partitions of the VDS test dataset.
Default: False
- --test-chr22-chrx-chry
Whether to run a test using only the chr22, chrX, and chrY chromosomes of the VDS test dataset.
Default: False
- --write-aou-downsampling-ht
Write v5 downsampling HT.
Default: False
- --compute-all-cov-release-stats-ht
Compute the all sites coverage, allele number, and quality histogram HT.
Default: False
- --merge-qual-hists
Merge variant quality histograms from AoU v8 and gnomAD v4 genomes.
Default: False
Get gnomAD genomes group membership HT.
- --write-group-membership-ht
Write group membership HT.
Default: False
- --test
Write test group membership HT to test path.
Default: False
Compute coverage release stats HT.
- --merge-gnomad-coverage
Subtract consent drop samples from v4 release HT to create gnomAD v5 genomes coverage HT.
Default: False
- --export-coverage-release-files
Join and export AoU + gnomAD v4 coverage release HT and TSV file.
Default: False
Compute AN release stats HT.
- --merge-gnomad-an
Subtract consent drop samples from v4 release HT to create gnomAD v5 genomes AN HT.
Default: False
- --export-an-release-files
Exports joint AoU + gnomAD v4 AN release HT and TSV file.
Default: False
Module Functions
|
Get Table with downsampling groups for all samples. |
|
Get genomes group membership HT for all sites allele number stratification. |
Validate VDS before densify. |
|
|
Compute coverage, allele number, and quality histograms per reference site. |
|
Subtract consent drop samples from gnomAD v4 genomes release HT to create gnomAD v5 genomes coverage HT. |
|
Join AoU and gnomAD coverage HTs for release. |
|
Subtract consent drop samples from gnomAD v4 genomes release HT to create gnomAD v5 genomes AN HT. |
|
Join AoU and gnomAD AN HTs for release. |
|
Join AoU and gnomAD qual hists HTs for release. |
Compute all sites coverage, allele number, and quality histograms for v5 genomes (AoU v8 + gnomAD v4). |
|
|
Get script argument parser. |
Script to compute coverage, allele number, and quality histograms on all gnomAD v5 genomes (AoU v8 + updated gnomAD v4).
- gnomad_qc.v5.annotations.compute_coverage.get_downsampling_ht(ht)[source]
Get Table with downsampling groups for all samples.
v5 downsampling is only applied to the AoU dataset. Desired groups: - 10,000 - 100,000 - Genetic ancestry group sizes for AFR, AMR, NFE Note that the only desired genetic ancestry group sizes are AFR, AMR, and NFE, but code will also generate downsamplings for all other groups.
- gnomad_qc.v5.annotations.compute_coverage.get_group_membership_ht(meta_ht, project, ds_ht=None)[source]
Get genomes group membership HT for all sites allele number stratification.
- gnomad_qc.v5.annotations.compute_coverage.validate_vds(vds)[source]
Validate VDS before densify.
Code is taken from https://github.com/hail-is/hail/blob/858f3ab30c2bcc46d6e57fdbfe408284b4b3de53/hail/python/hail/vds/variant_dataset.py#L271 at suggestion from Chris Vittal.
- Parameters:
vds (
VariantDataset) – Input VDS.- Return type:
None- Returns:
None; raises ValueError if VDS is not valid.
- gnomad_qc.v5.annotations.compute_coverage.compute_all_release_stats_per_ref_site(vds, ref_ht, sex_karyotype_field, project, coverage_over_x_bins=[1, 5, 10, 15, 20, 25, 30, 50, 100], interval_ht=None, group_membership_ht=None)[source]
Compute coverage, allele number, and quality histograms per reference site.
Note
Running this function prior to calculating frequencies removes the need for an additional densify for frequency calculations.
- Parameters:
vds (
VariantDataset) – Input VDS.ref_ht (
Table) – Reference HT.sex_karyotype_field (
str) – Field name for sex karyotype.project (
str) – Project name.coverage_over_x_bins (
List[int]) – List of boundaries for computing samples over X depth.interval_ht (
Optional[Table]) – Interval HT.group_membership_ht (
Optional[Table]) – Group membership HT.
- Return type:
- Returns:
HT with allele number and quality histograms per reference site.
- gnomad_qc.v5.annotations.compute_coverage.merge_gnomad_coverage_hts(gnomad_ht, gnomad_release_ht, coverage_over_x_bins=[1, 5, 10, 15, 20, 25, 30, 50, 100], v4_count=76215, consent_drop_count=866)[source]
Subtract consent drop samples from gnomAD v4 genomes release HT to create gnomAD v5 genomes coverage HT.
- Parameters:
gnomad_ht (
Table) – gnomAD coverage HT (contains coverage for consent drop samples only).gnomad_release_ht (
Table) – gnomAD v4 genomes coverage release HT.coverage_over_x_bins (
List[int]) – List of boundaries for computing samples over X. Default is [1, 5, 10, 15, 20, 25, 30, 50, 100].v4_count (
int) – Number of release gnomAD v4 genome samples. Default is 76215.consent_drop_count (
int) – Number of consent drop gnomAD v4 genome samples. Default is 866.
- Return type:
- Returns:
gnomAD v5 genomes coverage HT.
- gnomad_qc.v5.annotations.compute_coverage.join_aou_and_gnomad_coverage_ht(aou_ht, gnomad_ht, coverage_over_x_bins=[1, 5, 10, 15, 20, 25, 30, 50, 100], gnomad_v5_count=75349)[source]
Join AoU and gnomAD coverage HTs for release.
- Parameters:
aou_ht (
Table) – AoU coverage HT.gnomad_ht (
Table) – gnomAD v5 genomes coverage HT.coverage_over_x_bins (
List[int]) – List of boundaries for computing samples over X. Default is [1, 5, 10, 15, 20, 25, 30, 50, 100].gnomad_v5_count (
int) – Number of release gnomAD v5 genome samples. Default is 76215 - 866.
- Return type:
- Returns:
Joined HT.
- gnomad_qc.v5.annotations.compute_coverage.merge_gnomad_an_hts(gnomad_ht, gnomad_release_ht)[source]
Subtract consent drop samples from gnomAD v4 genomes release HT to create gnomAD v5 genomes AN HT.
- gnomad_qc.v5.annotations.compute_coverage.join_aou_and_gnomad_an_ht(aou_ht, gnomad_ht)[source]
Join AoU and gnomAD AN HTs for release.
- gnomad_qc.v5.annotations.compute_coverage.join_aou_and_gnomad_qual_hists_ht(aou_ht, gnomad_ht)[source]
Join AoU and gnomAD qual hists HTs for release.
Note
We did not compute qual hists for the gnomAD v4 genomes release (https://github.com/broadinstitute/gnomad_qc/blob/e65bdbb5768113c0129199a875d845da245690e2/gnomad_qc/v4/annotations/generate_freq_genomes.py#L1139). This means we will not also not recompute hists on the gnomAD v4 genomes for v5, which also means we will not subtract values from the samples to drop for consent reasons.