gnomad_qc.v5.data_ingestion.federated_validity_checks

Script to generate annotations for variant QC on gnomAD v4.

Module Functions

`gnomad_qc.v5.data_ingestion.federated_validity_checks.validate_config`(...)	Validate JSON config inputs.
`gnomad_qc.v5.data_ingestion.federated_validity_checks.validate_ht_fields`(ht, ...)	Check that necessary fields defined in the JSON config are present in the Hail Table.
`gnomad_qc.v5.data_ingestion.federated_validity_checks.check_missingness`(ht)	Check for and report the fraction of missing data in the Table.
`gnomad_qc.v5.data_ingestion.federated_validity_checks.validate_federated_data`(ht, ...)	Perform validity checks on federated data.
`gnomad_qc.v5.data_ingestion.federated_validity_checks.create_logtest_ht`([...])	Create a test Hail Table with nested struct annotations to test log output.
`gnomad_qc.v5.data_ingestion.federated_validity_checks.main`(args)	Perform validity checks for federated data.

Script to generate annotations for variant QC on gnomAD v4.

gnomad_qc.v5.data_ingestion.federated_validity_checks.validate_config(config, schema)[source]

Validate JSON config inputs.

Parameters:

config (Dict[str, Any]) – JSON configuration for parameter inputs.
schema (Dict[str, Any]) – JSON schema to use for validation.

Return type:

None

Returns:

None.

gnomad_qc.v5.data_ingestion.federated_validity_checks.validate_ht_fields(ht, config)[source]

Check that necessary fields defined in the JSON config are present in the Hail Table.

Parameters:

ht (Table) – Hail Table.
config (Dict[str, Any]) – JSON configuration for parameter inputs.

Return type:

None

Returns:

None.

gnomad_qc.v5.data_ingestion.federated_validity_checks.check_missingness(ht, missingness_threshold=0.5, struct_annotations=['grpmax', 'fafmax', 'histograms'])[source]

Check for and report the fraction of missing data in the Table.

Parameters:

ht (Table) – Input Table.
missingness_threshold (float) – Upper cutoff for allowed amount of missingness. Default is 0.50.
struct_annotations (List[str]) – List of struct annotations to check for missingness. Default is [‘grpmax’, ‘fafmax’, ‘histograms’].

Return type:

None

Returns:

None

gnomad_qc.v5.data_ingestion.federated_validity_checks.validate_federated_data(ht, freq_meta_expr, missingness_threshold=0.5, struct_annotations_for_missingness=['grpmax', 'fafmax', 'histograms'], freq_annotations_to_sum=['AC', 'AN', 'homozygote_count'], sort_order=['subset', 'gen_anc', 'sex', 'group'], nhomalt_metric='nhomalt', verbose=False, subsets=None, variant_filter_field='AS_VQSR', problematic_regions=['lcr', 'non_par', 'segdup'])[source]

Perform validity checks on federated data.

Parameters:

ht (Table) – Input Table.
freq_meta_expr (ArrayExpression) – Metadata expression that contains the values of the elements in meta_indexed_expr. The most often used expression is freq_meta to index into a ‘freq’ array (example: ht.freq_meta).
freq_annotations_to_sum (List[str]) – List of annotation fields within meta_expr to sum. Default is [‘AC’, ‘AN’, ‘homozygote_count’].
sort_order (List[str]) – Order in which groupings are unfurled into flattened annotations. Default is [“subset”, “gen_anc”, “sex”, “group”].
nhomalt_metric (str) – Name of metric denoting homozygous alternate count. Default is “nhomalt”.
verbose (bool) – If True, show top values of annotations being checked, including checks that pass; if False, show only top values of annotations that fail checks. Default is False.
subsets (List[str]) – List of sample subsets.
variant_filter_field (str) – String of variant filtration used in the filters annotation on ht (e.g. RF, VQSR, AS_VQSR). Default is “AS_VQSR”.
problematic_regions (List[str]) – List of regions considered problematic to run filter check in. Default is [“lcr”, “segdup”, “nonpar”].
missingness_threshold (float) –
struct_annotations_for_missingness (List[str]) –

Return type:

None

Returns:

None

gnomad_qc.v5.data_ingestion.federated_validity_checks.create_logtest_ht(exclude_xnonpar_y=False)[source]

Create a test Hail Table with nested struct annotations to test log output.

Parameters:: exclude_xnonpar_y (bool) – If True, exclude chrX non-pseudoautosomal region and chrY variants when making test data. Default is False.
Return type:: Table
Returns:: Table to use for testing log output.

gnomad_qc.v5.data_ingestion.federated_validity_checks.main(args)[source]: Perform validity checks for federated data.