gnomad.variant_qc.training

gnomad.variant_qc.training.sample_training_examples(ht, ...)

Return a Table of all positive and negative training examples in ht with an annotation indicating those that should be used for training.

gnomad.variant_qc.training.sample_training_examples(ht, tp_expr, fp_expr, fp_to_tp=1.0, test_expr=None)[source]

Return a Table of all positive and negative training examples in ht with an annotation indicating those that should be used for training.

If fp_to_tp is greater than 0, this true positive (TP) to false positive (FP) ratio will be used to determine sampling of training variants.

The returned Table has the following annotations:
  • train: indicates if the variant should be used for training. A row is given False for the annotation if True for test_expr, True for both tp_expr and fp_expr, or it is pruned out to obtain the desired fp_to_tp ratio.

  • label: indicates if a variant is a ‘TP’ or ‘FP’ and will also be labeled as such for variants defined by test_expr.

Note

  • This function does not support multi-allelic variants.

  • The function will give some stats about the TPs/FPs provided (Ti, Tv, indels).

Parameters:
  • ht (Table) – Input Table.

  • tp_expr (BooleanExpression) – Expression for TP examples.

  • fp_expr (BooleanExpression) – Expression for FP examples.

  • fp_to_tp (float) – FP to TP ratio. If set to <= 0, all training examples are used.

  • test_expr (Optional[BooleanExpression]) – Optional expression to exclude a set of variants from training set. Still contains TP/FP label annotation.

Return type:

Table

Returns:

Table subset with corresponding TP and FP examples with desired FP to TP ratio.