gnomad.variant_qc.evaluation

gnomad.variant_qc.evaluation.compute_ranked_bin(ht, ...)

Return a table with a bin for each row based on the ranking of score_expr.

gnomad.variant_qc.evaluation.compute_grouped_binned_ht(bin_ht)

Group a Table that has been annotated with bins (compute_ranked_bin or create_binned_ht).

gnomad.variant_qc.evaluation.compute_binned_truth_sample_concordance(ht, ...)

Determine the concordance (TP, FP, FN) between a truth sample within the callset and the samples truth data grouped by bins computed using compute_ranked_bin.

gnomad.variant_qc.evaluation.create_truth_sample_ht(mt, ...)

Compute a table comparing a truth sample in callset vs the truth.

gnomad.variant_qc.evaluation.add_rank(ht, ...)

Add rank based on the score_expr.

gnomad.variant_qc.evaluation.compute_ranked_bin(ht, score_expr, bin_expr={'bin': True}, compute_snv_indel_separately=True, n_bins=100, desc=True)[source]

Return a table with a bin for each row based on the ranking of score_expr.

The bin is computed by dividing the score_expr into n_bins bins containing approximately equal numbers of elements. This is done by ranking the rows by score_expr (and a random number in cases where multiple variants have the same score) and then assigning the variant to a bin based on its ranking.

If compute_snv_indel_separately is True all items in bin_expr will be stratified by snv / indels for the ranking and bin calculation. Because SNV and indel rows are mutually exclusive, they are re-combined into a single annotation. For example if we have the following four variants and scores and n_bins of 2:

Variant

Type

Score

bin - compute_snv_indel_separately:

False

True

Var1

SNV

0.1

1

1

Var2

SNV

0.2

1

2

Var3

Indel

0.3

2

1

Var4

Indel

0.4

2

2

Note

The bin_expr defines which data the bin(s) should be computed on. E.g., to get biallelic specific binning and singleton specific binning, the following could be used:

bin_expr={
    'biallelic_bin': ~ht.was_split,
    'singleton_bin': ht.singleton
}
Parameters:
  • ht (Table) – Input Table

  • score_expr (NumericExpression) – Expression containing the score

  • bin_expr (Dict[str, BooleanExpression]) – Specific row grouping(s) to perform ranking and binning on (see note)

  • compute_snv_indel_separately (bool) – Should all bin_expr items be stratified by SNVs / indels

  • n_bins (int) – Number of bins to bin the data into

  • desc (bool) – Whether to bin the score in descending order

Return type:

Table

Returns:

Table with the requested bin annotations

gnomad.variant_qc.evaluation.compute_grouped_binned_ht(bin_ht, checkpoint_path=None)[source]

Group a Table that has been annotated with bins (compute_ranked_bin or create_binned_ht).

The table will be grouped by bin_id (bin, biallelic, etc.), contig, snv, bi_allelic and singleton.

Note

If performing an aggregation following this grouping (such as score_bin_agg) then the aggregation function will need to use ht._parent to get the origin Table from the GroupedTable for the aggregation

Parameters:
  • bin_ht (Table) – Input Table with a bin_id annotation

  • checkpoint_path (Optional[str]) – If provided an intermediate checkpoint table is created with all required annotations before shuffling.

Return type:

GroupedTable

Returns:

Table grouped by bins(s)

gnomad.variant_qc.evaluation.compute_binned_truth_sample_concordance(ht, binned_score_ht, n_bins=100, add_bins={})[source]

Determine the concordance (TP, FP, FN) between a truth sample within the callset and the samples truth data grouped by bins computed using compute_ranked_bin.

Note

The input ‘ht` should contain three row fields:
  • score: value to use for binning

  • GT: a CallExpression containing the genotype of the evaluation data for the sample

  • truth_GT: a CallExpression containing the genotype of the truth sample

The input binned_score_ht should contain:
  • score: value used to bin the full callset

  • bin: the full callset bin

‘add_bins` can be used to add additional global and truth sample binning to the final binned truth sample concordance HT. The keys in add_bins must be present in binned_score_ht and the values in add_bins should be expressions on ht that define a subset of variants to bin in the truth sample. An example is if we want to look at the global and truth sample binning on only bi-allelic variants. add_bins could be set to {‘biallelic_bin’: ht.biallelic}.

The table is grouped by global/truth sample bin and variant type and contains TP, FP and FN.

Parameters:
  • ht (Table) – Input HT

  • binned_score_ht (Table) – Table with the bin annotation for each variant

  • n_bins (int) – Number of bins to bin the data into

  • add_bins (Dict[str, BooleanExpression]) – Dictionary of additional global bin columns (key) and the expr to use for binning the truth sample (value)

Return type:

Table

Returns:

Binned truth sample concordance HT

gnomad.variant_qc.evaluation.create_truth_sample_ht(mt, truth_mt, high_confidence_intervals_ht)[source]

Compute a table comparing a truth sample in callset vs the truth.

Parameters:
  • mt (MatrixTable) – MT of truth sample from callset to be compared to truth

  • truth_mt (MatrixTable) – MT of truth sample

  • high_confidence_intervals_ht (Table) – High confidence interval HT

Return type:

Table

Returns:

Table containing both the callset truth sample and the truth data

gnomad.variant_qc.evaluation.add_rank(ht, score_expr, subrank_expr=None)[source]

Add rank based on the score_expr. Rank is added for snvs and indels separately.

If one or more subrank_expr are provided, then subrank is added based on all sites for which the boolean expression is true.

In addition, variant counts (snv, indel separately) is added as a global (rank_variant_counts).

Parameters:
  • ht (Table) – input Hail Table containing variants (with QC annotations) to be ranked

  • score_expr (NumericExpression) – the Table annotation by which ranking should be scored

  • subrank_expr (Optional[Dict[str, BooleanExpression]]) – Any subranking to be added in the form name_of_subrank: subrank_filtering_expr

Return type:

Table

Returns:

Table with rankings added