gnomad_qc.v4.variant_qc.evaluation

Script to create Tables with aggregate variant statistics by variant QC score bins needed for evaluation plots.

usage: gnomad_qc.v4.variant_qc.evaluation.py [-h]
                                             [--slack-channel SLACK_CHANNEL]
                                             [--overwrite] [--test]
                                             [--model-id MODEL_ID]
                                             [--n-bins N_BINS]
                                             [--create-bin-ht]
                                             [--score-bin-validity-check]
                                             [--create-aggregated-bin-ht]
                                             [--extract-truth-samples]
                                             [--truth-sample-mt-n-partitions TRUTH_SAMPLE_MT_N_PARTITIONS]
                                             [--merge-with-truth-data]
                                             [--bin-truth-sample-concordance]

Named Arguments

--slack-channel

Slack channel to post results and notifications to.

--overwrite

Overwrite all data from this subset. (default: False)

Default: False

--test

If the model being evaluated is a test model.

Default: False

--model-id

Model ID.

--n-bins

Number of bins for the binned file. Default is 100.

Default: 100

--create-bin-ht

When set, creates file annotated with bin based on rank of VQSR/RF score.

Default: False

--score-bin-validity-check

When set, runs ranking validity checks.

Default: False

--create-aggregated-bin-ht

When set, creates a file with aggregate counts of variants based on bins.

Default: False

--extract-truth-samples

Extract truth samples from callset MatrixTable.

Default: False

--truth-sample-mt-n-partitions

Desired number of partitions for th truth sample MatrixTable. Default is 5000.

Default: 5000

--merge-with-truth-data

Computes a table for each truth sample comparing the truth sample in the callset vs the truth.

Default: False

--bin-truth-sample-concordance

Merges concordance results (callset vs. truth) for a given truth sample with bins from specified model.

Default: False

Module Functions

gnomad_qc.v4.variant_qc.evaluation.create_bin_ht(ht, ...)

Create a table with bin annotations added for a variant QC run.

gnomad_qc.v4.variant_qc.evaluation.score_bin_validity_check(ht)

Check that the bin scoring looks correct by printing so aggregate numbers.

gnomad_qc.v4.variant_qc.evaluation.create_aggregated_bin_ht(ht, ...)

Aggregate variants into bins using previously annotated bin information.

gnomad_qc.v4.variant_qc.evaluation.get_evaluation_resources(...)

Get PipelineResourceCollection for all resources needed in the variant QC evaluation pipeline.

gnomad_qc.v4.variant_qc.evaluation.main(args)

Script to create Tables with aggregate variant statistics by variant QC score bins needed for evaluation plots.

gnomad_qc.v4.variant_qc.evaluation.get_script_argument_parser()

Get script argument parser.

Script to create Tables with aggregate variant statistics by variant QC score bins needed for evaluation plots.

gnomad_qc.v4.variant_qc.evaluation.create_bin_ht(ht, info_ht, rf_annotations_ht, n_bins, model_type=False)[source]

Create a table with bin annotations added for a variant QC run.

Parameters:
  • ht (Table) – Table with variant QC result annotations.

  • info_ht (Table) – Table with info annotations.

  • rf_annotations_ht (Table) – Table with RF annotations.

  • n_bins (int) – Number of bins to bin the data into.

  • model_type (bool) – Type of variant QC model used to annotate the data. Must be one of ‘vqsr’, ‘rf’, or ‘if’.

Return type:

Table

Returns:

Table with bin annotations.

gnomad_qc.v4.variant_qc.evaluation.score_bin_validity_check(ht)[source]

Check that the bin scoring looks correct by printing so aggregate numbers.

Parameters:

ht (Table) – Table with bin annotations.

Return type:

None

Returns:

None

gnomad_qc.v4.variant_qc.evaluation.create_aggregated_bin_ht(ht, trio_stats_ht)[source]

Aggregate variants into bins using previously annotated bin information.

Variants are grouped by:

-‘bin_id’ (rank, bi-allelic, etc.) -‘contig’ -‘snv’ -‘bi_allelic’ -‘singleton’

For each bin, aggregates statistics needed for evaluation plots.

Parameters:
  • ht (Table) – Table with bin annotations.

  • trio_stats_ht (Table) – Table with trio statistics.

Return type:

Table

Returns:

Table of aggregate statistics by bin.

gnomad_qc.v4.variant_qc.evaluation.get_evaluation_resources(test, overwrite, model_id=None)[source]

Get PipelineResourceCollection for all resources needed in the variant QC evaluation pipeline.

Parameters:
  • test (bool) – Whether to gather all resources for testing.

  • overwrite (bool) – Whether to overwrite resources if they exist.

  • model_id (str) – Model ID of RF model results to use.

Return type:

PipelineResourceCollection

Returns:

PipelineResourceCollection containing resources for all steps of the variant QC evaluation pipeline.

gnomad_qc.v4.variant_qc.evaluation.main(args)[source]

Script to create Tables with aggregate variant statistics by variant QC score bins needed for evaluation plots.