gnomad.variant_qc.training
|
Return a Table of all positive and negative training examples in ht with an annotation indicating those that should be used for training. |
- gnomad.variant_qc.training.sample_training_examples(ht, tp_expr, fp_expr, fp_to_tp=1.0, test_expr=None)[source]
Return a Table of all positive and negative training examples in ht with an annotation indicating those that should be used for training.
If fp_to_tp is greater than 0, this true positive (TP) to false positive (FP) ratio will be used to determine sampling of training variants.
- The returned Table has the following annotations:
train: indicates if the variant should be used for training. A row is given False for the annotation if True for test_expr, True for both tp_expr and fp_expr, or it is pruned out to obtain the desired fp_to_tp ratio.
label: indicates if a variant is a ‘TP’ or ‘FP’ and will also be labeled as such for variants defined by test_expr.
Note
This function does not support multi-allelic variants.
The function will give some stats about the TPs/FPs provided (Ti, Tv, indels).
- Parameters:
ht (
Table
) – Input Table.tp_expr (
BooleanExpression
) – Expression for TP examples.fp_expr (
BooleanExpression
) – Expression for FP examples.fp_to_tp (
float
) – FP to TP ratio. If set to <= 0, all training examples are used.test_expr (
Optional
[BooleanExpression
]) – Optional expression to exclude a set of variants from training set. Still contains TP/FP label annotation.
- Return type:
- Returns:
Table subset with corresponding TP and FP examples with desired FP to TP ratio.