gnomad_qc.v4.annotations.compute_coverage
Script to compute coverage statistics on gnomAD v4 exomes.
usage: gnomad_qc.v4.annotations.compute_coverage.py [-h] [--overwrite]
[--test-2-partitions]
[--test-chr22-chrx-chry]
[--n-partitions N_PARTITIONS]
[--data-type {exomes,genomes}]
[--compute-coverage-ht]
[--compute-all-sites-an-and-qual-hist-ht]
[--stratify-by-ukb-and-platform]
[--calling-interval-name {ukb,broad,intersection,union}]
[--calling-interval-padding {0,50,150}]
[--export-coverage-release-files]
[--export-all-sites-an-release-files]
Named Arguments
- --overwrite
Overwrite existing hail Tables.
Default: False
- --test-2-partitions
Whether to run a test using only the first 2 partitions of the VDS test dataset.
Default: False
- --test-chr22-chrx-chry
Whether to run a test using only the chr22, chrX, and chrY chromosomes of the VDS test dataset.
Default: False
- --n-partitions
Number of partitions to use for the output Table.
Default: 5000
- --data-type
Possible choices: exomes, genomes
Data type to compute coverage or AN and quality hist all sites Table for. One of ‘exomes’ or ‘genomes’.
Default: “exomes”
- --compute-coverage-ht
Compute the coverage HT.
Default: False
- --compute-all-sites-an-and-qual-hist-ht
Compute the all sites allele number and quality histogram HT.
Default: False
- --stratify-by-ukb-and-platform
Whether to compute coverage stratified by UKB/non-UKB and platform. Only applicable if –data-type is exomes.
Default: False
- --calling-interval-name
Possible choices: ukb, broad, intersection, union
Name of calling intervals to use for interval coverage. One of: ‘ukb’, ‘broad’, ‘intersection’, or ‘union’. Only applicable if –data-type is exomes.
Default: “union”
- --calling-interval-padding
Possible choices: 0, 50, 150
Number of base pair padding to use on the calling intervals. One of 0, 50, or 150 bp. Only applicable if –data-type is exomes.
Default: 150
- --export-coverage-release-files
Exports coverage release HT and TSV file.
Default: False
- --export-all-sites-an-release-files
Export all sites AN release HT and TSV file.
Default: False
Module Functions
|
Get exomes group membership HT for all sites allele number stratification. |
|
Get genomes group membership HT for all sites allele number stratification. |
|
Adjust interval padding in HT. |
|
Compute allele number and quality histograms per reference site. |
|
Add interval annotations to exomes HT. |
|
Get PipelineResourceCollection for all resources needed in the coverage pipeline. |
Compute coverage statistics, including mean, median_approx, and coverage over certain DPs. |
|
|
Get script argument parser. |
Script to compute coverage statistics on gnomAD v4 exomes.
- gnomad_qc.v4.annotations.compute_coverage.get_exomes_group_membership_ht(meta_ht, ds_ht, non_ukb_ds_ht)[source]
Get exomes group membership HT for all sites allele number stratification.
- gnomad_qc.v4.annotations.compute_coverage.get_genomes_group_membership_ht(meta_ht)[source]
Get genomes group membership HT for all sites allele number stratification.
- gnomad_qc.v4.annotations.compute_coverage.adjust_interval_padding(ht, padding)[source]
Adjust interval padding in HT.
Warning
This function can lead to overlapping intervals, so it is not recommended for most applications. For example, it can be used to filter a variant list to all variants within the returned interval list, but would not work for getting an aggregate statistic for each interval if the desired output is independent statistics.
- gnomad_qc.v4.annotations.compute_coverage.compute_an_and_qual_hists_per_ref_site(vds, ref_ht, interval_ht=None, group_membership_ht=None)[source]
Compute allele number and quality histograms per reference site.
- Parameters:
vds (
VariantDataset
) – Input VDS.ref_ht (
Table
) – Reference HT.interval_ht (
Optional
[Table
]) – Interval HT.group_membership_ht (
Optional
[Table
]) – Group membership HT.
- Return type:
- Returns:
HT with allele number and quality histograms per reference site.
- gnomad_qc.v4.annotations.compute_coverage.add_exomes_interval_annotations(ht, padding=150)[source]
Add interval annotations to exomes HT.
- gnomad_qc.v4.annotations.compute_coverage.get_coverage_resources(test, overwrite, data_type, calling_interval_name=None, calling_interval_padding=None)[source]
Get PipelineResourceCollection for all resources needed in the coverage pipeline.
- Parameters:
test (
bool
) – Whether to gather all resources for testing.overwrite (
bool
) – Whether to overwrite resources if they exist.data_type (
str
) – One of ‘exomes’ or ‘genomes’.calling_interval_name (
Optional
[str
]) – Name of calling intervals to use.calling_interval_padding (
Optional
[int
]) – Padding to use for calling intervals.
- Return type:
PipelineResourceCollection
- Returns:
PipelineResourceCollection containing resources for all steps of the coverage pipeline.