gnomad_qc.v5.data_ingestion.federated_validity_checks

Script to generate annotations for variant QC on gnomAD v4.

Module Functions

gnomad_qc.v5.data_ingestion.federated_validity_checks.validate_config(...)

Validate JSON config inputs.

gnomad_qc.v5.data_ingestion.federated_validity_checks.validate_ht_fields(ht, ...)

Check that necessary fields defined in the JSON config are present in the Hail Table.

gnomad_qc.v5.data_ingestion.federated_validity_checks.check_missingness(ht)

Check for and report the fraction of missing data in the Table.

gnomad_qc.v5.data_ingestion.federated_validity_checks.validate_federated_data(ht, ...)

Perform validity checks on federated data.

gnomad_qc.v5.data_ingestion.federated_validity_checks.main(args)

Perform validity checks for federated data.

Script to generate annotations for variant QC on gnomAD v4.

gnomad_qc.v5.data_ingestion.federated_validity_checks.validate_config(config, schema)[source]

Validate JSON config inputs.

Parameters:
  • config (Dict[str, Any]) – JSON configuration for parameter inputs.

  • schema (Dict[str, Any]) – JSON schema to use for validation.

Return type:

None

Returns:

None.

gnomad_qc.v5.data_ingestion.federated_validity_checks.validate_ht_fields(ht, config)[source]

Check that necessary fields defined in the JSON config are present in the Hail Table.

Parameters:
  • ht (Table) – Hail Table.

  • config (Dict[str, Any]) – JSON configuration for parameter inputs.

Return type:

None

Returns:

None.

gnomad_qc.v5.data_ingestion.federated_validity_checks.check_missingness(ht, missingness_threshold=0.5, struct_annotations=['grpmax', 'fafmax', 'histograms'])[source]

Check for and report the fraction of missing data in the Table.

Parameters:
  • ht (Table) – Input Table.

  • missingness_threshold (float) – Upper cutoff for allowed amount of missingness. Default is 0.50.

  • struct_annotations (List[str]) – List of struct annotations to check for missingness. Default is [‘grpmax’, ‘fafmax’, ‘histograms’].

Return type:

None

Returns:

None

gnomad_qc.v5.data_ingestion.federated_validity_checks.validate_federated_data(ht, freq_meta_expr, missingness_threshold=0.5, struct_annotations_for_missingness=['grpmax', 'fafmax', 'histograms'], freq_annotations_to_sum=['AC', 'AN', 'homozygote_count'], freq_sort_order=['gen_anc', 'sex', 'group'], nhomalt_metric='nhomalt', verbose=False)[source]

Perform validity checks on federated data.

Parameters:
  • ht (Table) – Input Table.

  • freq_meta_expr (ArrayExpression) – Metadata expression that contains the values of the elements in meta_indexed_expr. The most often used expression is freq_meta to index into a ‘freq’ array (example: ht.freq_meta).

  • freq_annotations_to_sum (List[str]) – List of annotation fields within meta_expr to sum. Default is [‘AC’, ‘AN’, ‘homozygote_count’].

  • freq_sort_order (List[str]) – Order in which groupings are unfurled into flattened annotations. Default is [“gen_anc”, “sex”, “group”].

  • nhomalt_metric (str) – Name of metric denoting homozygous alternate count. Default is “nhomalt”.

  • verbose (bool) – If True, show top values of annotations being checked, including checks that pass; if False, show only top values of annotations that fail checks. Default is False.

  • missingness_threshold (float) –

  • struct_annotations_for_missingness (List[str]) –

Return type:

None

Returns:

None

gnomad_qc.v5.data_ingestion.federated_validity_checks.main(args)[source]

Perform validity checks for federated data.