gnomad_qc.v4.annotations.compute_coverage

Script to compute coverage statistics on gnomAD v4 exomes.

usage: gnomad_qc.v4.annotations.compute_coverage.py [-h] [--overwrite]
                                                    [--test-2-partitions]
                                                    [--test-chr22-chrx-chry]
                                                    [--n-partitions N_PARTITIONS]
                                                    [--data-type {exomes,genomes}]
                                                    [--compute-coverage-ht]
                                                    [--compute-all-sites-an-and-qual-hist-ht]
                                                    [--stratify-by-ukb-and-platform]
                                                    [--calling-interval-name {ukb,broad,intersection,union}]
                                                    [--calling-interval-padding {0,50,150}]
                                                    [--export-coverage-release-files]
                                                    [--export-all-sites-an-release-files]

Named Arguments

--overwrite

Overwrite existing hail Tables.

Default: False

--test-2-partitions

Whether to run a test using only the first 2 partitions of the VDS test dataset.

Default: False

--test-chr22-chrx-chry

Whether to run a test using only the chr22, chrX, and chrY chromosomes of the VDS test dataset.

Default: False

--n-partitions

Number of partitions to use for the output Table.

Default: 5000

--data-type

Possible choices: exomes, genomes

Data type to compute coverage or AN and quality hist all sites Table for. One of ‘exomes’ or ‘genomes’.

Default: “exomes”

--compute-coverage-ht

Compute the coverage HT.

Default: False

--compute-all-sites-an-and-qual-hist-ht

Compute the all sites allele number and quality histogram HT.

Default: False

--stratify-by-ukb-and-platform

Whether to compute coverage stratified by UKB/non-UKB and platform. Only applicable if –data-type is exomes.

Default: False

--calling-interval-name

Possible choices: ukb, broad, intersection, union

Name of calling intervals to use for interval coverage. One of: ‘ukb’, ‘broad’, ‘intersection’, or ‘union’. Only applicable if –data-type is exomes.

Default: “union”

--calling-interval-padding

Possible choices: 0, 50, 150

Number of base pair padding to use on the calling intervals. One of 0, 50, or 150 bp. Only applicable if –data-type is exomes.

Default: 150

--export-coverage-release-files

Exports coverage release HT and TSV file.

Default: False

--export-all-sites-an-release-files

Export all sites AN release HT and TSV file.

Default: False

Module Functions

gnomad_qc.v4.annotations.compute_coverage.get_exomes_group_membership_ht(...)

Get exomes group membership HT for all sites allele number stratification.

gnomad_qc.v4.annotations.compute_coverage.get_genomes_group_membership_ht(meta_ht)

Get genomes group membership HT for all sites allele number stratification.

gnomad_qc.v4.annotations.compute_coverage.adjust_interval_padding(ht, ...)

Adjust interval padding in HT.

gnomad_qc.v4.annotations.compute_coverage.compute_an_and_qual_hists_per_ref_site(...)

Compute allele number and quality histograms per reference site.

gnomad_qc.v4.annotations.compute_coverage.add_exomes_interval_annotations(ht)

Add interval annotations to exomes HT.

gnomad_qc.v4.annotations.compute_coverage.get_coverage_resources(...)

Get PipelineResourceCollection for all resources needed in the coverage pipeline.

gnomad_qc.v4.annotations.compute_coverage.main(args)

Compute coverage statistics, including mean, median_approx, and coverage over certain DPs.

gnomad_qc.v4.annotations.compute_coverage.get_script_argument_parser()

Get script argument parser.

Script to compute coverage statistics on gnomAD v4 exomes.

gnomad_qc.v4.annotations.compute_coverage.get_exomes_group_membership_ht(meta_ht, ds_ht, non_ukb_ds_ht)[source]

Get exomes group membership HT for all sites allele number stratification.

Parameters:
  • meta_ht (Table) – Metadata HT.

  • ds_ht (Table) – Full frequency downsampling HT.

  • non_ukb_ds_ht (Table) – Non-UKB frequency downsampling HT.

Return type:

Table

Returns:

Group membership HT.

gnomad_qc.v4.annotations.compute_coverage.get_genomes_group_membership_ht(meta_ht)[source]

Get genomes group membership HT for all sites allele number stratification.

Parameters:

meta_ht (Table) – Metadata HT.

Return type:

Table

Returns:

Group membership HT.

gnomad_qc.v4.annotations.compute_coverage.adjust_interval_padding(ht, padding)[source]

Adjust interval padding in HT.

Warning

This function can lead to overlapping intervals, so it is not recommended for most applications. For example, it can be used to filter a variant list to all variants within the returned interval list, but would not work for getting an aggregate statistic for each interval if the desired output is independent statistics.

Parameters:
  • ht (Table) – HT to adjust.

  • padding (int) – Padding to use.

Return type:

Table

Returns:

HT with adjusted interval padding.

gnomad_qc.v4.annotations.compute_coverage.compute_an_and_qual_hists_per_ref_site(vds, ref_ht, interval_ht=None, group_membership_ht=None)[source]

Compute allele number and quality histograms per reference site.

Parameters:
  • vds (VariantDataset) – Input VDS.

  • ref_ht (Table) – Reference HT.

  • interval_ht (Optional[Table]) – Interval HT.

  • group_membership_ht (Optional[Table]) – Group membership HT.

Return type:

Table

Returns:

HT with allele number and quality histograms per reference site.

gnomad_qc.v4.annotations.compute_coverage.add_exomes_interval_annotations(ht, padding=150)[source]

Add interval annotations to exomes HT.

Parameters:
  • ht (Table) – Input HT.

  • padding (int) – Number of base pair padding to use for the calling intervals.

Return type:

Table

Returns:

HT with exome interval annotations.

gnomad_qc.v4.annotations.compute_coverage.get_coverage_resources(test, overwrite, data_type, calling_interval_name=None, calling_interval_padding=None)[source]

Get PipelineResourceCollection for all resources needed in the coverage pipeline.

Parameters:
  • test (bool) – Whether to gather all resources for testing.

  • overwrite (bool) – Whether to overwrite resources if they exist.

  • data_type (str) – One of ‘exomes’ or ‘genomes’.

  • calling_interval_name (Optional[str]) – Name of calling intervals to use.

  • calling_interval_padding (Optional[int]) – Padding to use for calling intervals.

Return type:

PipelineResourceCollection

Returns:

PipelineResourceCollection containing resources for all steps of the coverage pipeline.

gnomad_qc.v4.annotations.compute_coverage.main(args)[source]

Compute coverage statistics, including mean, median_approx, and coverage over certain DPs.