gnomad_qc.v4.variant_qc.evaluation
Script to create Tables with aggregate variant statistics by variant QC score bins needed for evaluation plots.
usage: gnomad_qc.v4.variant_qc.evaluation.py [-h]
[--slack-channel SLACK_CHANNEL]
[--overwrite] [--test]
[--model-id MODEL_ID]
[--n-bins N_BINS]
[--create-bin-ht]
[--score-bin-validity-check]
[--create-aggregated-bin-ht]
[--extract-truth-samples]
[--truth-sample-mt-n-partitions TRUTH_SAMPLE_MT_N_PARTITIONS]
[--merge-with-truth-data]
[--bin-truth-sample-concordance]
Named Arguments
- --slack-channel
Slack channel to post results and notifications to.
- --overwrite
Overwrite all data from this subset. (default: False)
Default: False
- --test
If the model being evaluated is a test model.
Default: False
- --model-id
Model ID.
- --n-bins
Number of bins for the binned file. Default is 100.
Default: 100
- --create-bin-ht
When set, creates file annotated with bin based on rank of VQSR/RF score.
Default: False
- --score-bin-validity-check
When set, runs ranking validity checks.
Default: False
- --create-aggregated-bin-ht
When set, creates a file with aggregate counts of variants based on bins.
Default: False
- --extract-truth-samples
Extract truth samples from callset MatrixTable.
Default: False
- --truth-sample-mt-n-partitions
Desired number of partitions for th truth sample MatrixTable. Default is 5000.
Default: 5000
- --merge-with-truth-data
Computes a table for each truth sample comparing the truth sample in the callset vs the truth.
Default: False
- --bin-truth-sample-concordance
Merges concordance results (callset vs. truth) for a given truth sample with bins from specified model.
Default: False
Module Functions
Create a table with bin annotations added for a variant QC run. |
|
|
Check that the bin scoring looks correct by printing so aggregate numbers. |
|
Aggregate variants into bins using previously annotated bin information. |
|
Get PipelineResourceCollection for all resources needed in the variant QC evaluation pipeline. |
Script to create Tables with aggregate variant statistics by variant QC score bins needed for evaluation plots. |
|
|
Get script argument parser. |
Script to create Tables with aggregate variant statistics by variant QC score bins needed for evaluation plots.
- gnomad_qc.v4.variant_qc.evaluation.create_bin_ht(ht, info_ht, rf_annotations_ht, n_bins, model_type=False)[source]
Create a table with bin annotations added for a variant QC run.
- Parameters:
ht (
Table
) – Table with variant QC result annotations.info_ht (
Table
) – Table with info annotations.rf_annotations_ht (
Table
) – Table with RF annotations.n_bins (
int
) – Number of bins to bin the data into.model_type (
bool
) – Type of variant QC model used to annotate the data. Must be one of ‘vqsr’, ‘rf’, or ‘if’.
- Return type:
- Returns:
Table with bin annotations.
- gnomad_qc.v4.variant_qc.evaluation.score_bin_validity_check(ht)[source]
Check that the bin scoring looks correct by printing so aggregate numbers.
- Parameters:
ht (
Table
) – Table with bin annotations.- Return type:
None
- Returns:
None
- gnomad_qc.v4.variant_qc.evaluation.create_aggregated_bin_ht(ht, trio_stats_ht)[source]
Aggregate variants into bins using previously annotated bin information.
- Variants are grouped by:
-‘bin_id’ (rank, bi-allelic, etc.) -‘contig’ -‘snv’ -‘bi_allelic’ -‘singleton’
For each bin, aggregates statistics needed for evaluation plots.
- gnomad_qc.v4.variant_qc.evaluation.get_evaluation_resources(test, overwrite, model_id=None)[source]
Get PipelineResourceCollection for all resources needed in the variant QC evaluation pipeline.
- Parameters:
test (
bool
) – Whether to gather all resources for testing.overwrite (
bool
) – Whether to overwrite resources if they exist.model_id (
str
) – Model ID of RF model results to use.
- Return type:
PipelineResourceCollection
- Returns:
PipelineResourceCollection containing resources for all steps of the variant QC evaluation pipeline.