gnomad.variant_qc.training

Return a Table of all positive and negative training examples in ht with an annotation indicating those that should be used for training.

gnomad.variant_qc.training.sample_training_examples(ht, tp_expr, fp_expr, fp_to_tp=1.0, test_expr=None)[source]

Return a Table of all positive and negative training examples in ht with an annotation indicating those that should be used for training.

If fp_to_tp is greater than 0, this true positive (TP) to false positive (FP) ratio will be used to determine sampling of training variants.

The returned Table has the following annotations:

train: indicates if the variant should be used for training. A row is given False for the annotation if True for test_expr, True for both tp_expr and fp_expr, or it is pruned out to obtain the desired fp_to_tp ratio.
label: indicates if a variant is a ‘TP’ or ‘FP’ and will also be labeled as such for variants defined by test_expr.

Note

Parameters:

ht (Table) – Input Table.
tp_expr (BooleanExpression) – Expression for TP examples.
fp_expr (BooleanExpression) – Expression for FP examples.
fp_to_tp (float) – FP to TP ratio. If set to <= 0, all training examples are used.
test_expr (Optional[BooleanExpression]) – Optional expression to exclude a set of variants from training set. Still contains TP/FP label annotation.

Return type:

Table

Returns:

Table subset with corresponding TP and FP examples with desired FP to TP ratio.