gnomad_qc.v4.sample_qc.create_sample_qc_metadata_ht
Script to merge the output of all sample QC modules into a single Table.
usage: gnomad_qc.v4.sample_qc.create_sample_qc_metadata_ht.py
[-h] [--overwrite] [--slack-channel SLACK_CHANNEL]
Named Arguments
- --overwrite
Overwrite all data from this subset (default: False)
Default: False
- --slack-channel
Slack channel to post results and notifications to.
Module Functions
|
Load project-specific metadata Table and add GATK version. |
|
Load and reformat sex imputation Table for annotation on the combined meta Table. |
|
Load and reformat hard-filters Table for annotation on the combined meta Table. |
|
Parse relatedness Table to get every relationship (except UNRELATED) per sample. |
|
Return case statement to populate relatedness filters in sample_filters struct. |
|
Get relatedness relationship annotations for the combined meta Table. |
|
Get relatedness filtering Table for the combined meta Table. |
|
Annotate base_ht with contents of ann_ht and optionally check that sample counts match. |
|
Combine sample filters into a single Table to be added to the metadata Table. |
|
Combine sample contamination, chr20 mean DP, and QC MT callrate into a single Table. |
|
Combine all sample QC metadata Tables into a single Table. |
|
Merge the output of all sample QC modules into a single Table. |
|
Get script argument parser. |
Script to merge the output of all sample QC modules into a single Table.
- gnomad_qc.v4.sample_qc.create_sample_qc_metadata_ht.get_project_meta()[source]
Load project-specific metadata Table and add GATK version.
- Return type:
- Returns:
GATK version annotated project metadata Table.
- gnomad_qc.v4.sample_qc.create_sample_qc_metadata_ht.get_sex_imputation_ht()[source]
Load and reformat sex imputation Table for annotation on the combined meta Table.
- Return type:
- Returns:
Reformatted sex imputation Table.
- gnomad_qc.v4.sample_qc.create_sample_qc_metadata_ht.get_hard_filters_ht(ht)[source]
Load and reformat hard-filters Table for annotation on the combined meta Table.
Parse relatedness Table to get every relationship (except UNRELATED) per sample.
Return Table keyed by sample with all sample relationships in dictionary where the key is the relationship and the value is a set of all samples with that relationship to the given sample.
- Parameters:
ht (
Table
) – Table with inferred relationship information. Keyed by sample pair (i, j).filter_exprs (
Dict
[str
,BooleanExpression
]) – Optional dictionary of filter expressions to apply to ht before creating the ‘relationships’ annotations. Keyed by the postfix to add to ‘relationships’ as the annotation label, and with boolean expressions as the values. By default, no additional filtering is applied, and a single ‘relationships’ annotation is created.
- Return type:
- Returns:
Table keyed by sample (s) with all relationships annotated as a dict.
- gnomad_qc.v4.sample_qc.create_sample_qc_metadata_ht.get_relationship_filter_expr(hard_filtered_expr, related_drop_expr, relationship_set, relationship)[source]
Return case statement to populate relatedness filters in sample_filters struct.
- Parameters:
hard_filtered_expr (
BooleanExpression
) – Boolean for whether sample was hard filtered.related_drop_expr (
BooleanExpression
) – Boolean for whether sample was filtered due to relatedness.relationship_set (
SetExpression
) – Set containing all possible relationship strings for sample.relationship (
str
) – Relationship to check for. One of DUPLICATE_OR_TWINS, PARENT_CHILD, SIBLINGS, or SECOND_DEGREE_RELATIVES.
- Return type:
- Returns:
Case statement used to populate sample_filters related filter field.
- gnomad_qc.v4.sample_qc.create_sample_qc_metadata_ht.annotate_relationships(ht, outlier_filter_ht)[source]
Get relatedness relationship annotations for the combined meta Table.
- Table has the following annotations:
relationships: A dictionary of all relationships (except UNRELATED) the sample has with other samples in the dataset. The key is the relationship and the value is a set of all samples with that relationship to the given sample.
gnomad_v3_duplicate: Sample is in the gnomAD v3.1 sample set that passed hard filtering.
gnomad_v3_release_duplicate: Sample is in the gnomAD v3.1 release.
- Parameters:
- Return type:
- Returns:
Table with related filters added and Table with relationship and gnomad v3 overlap information.
Get relatedness filtering Table for the combined meta Table.
Add the following related filter boolean annotations to the input ht under a relatedness_filters struct:
related: Whether the sample was filtered for second-degree (or closer) relatedness in the ancestry inference PCA.
duplicate_or_twin: Whether the filtered sample has a duplicate or twin among all samples that are not hard-filtered.
parent_child: Whether the filtered sample has a parent or child among all samples that are not hard-filtered.
sibling: Whether the filtered sample has a sibling among all samples that are not hard-filtered.
Any sample in ht that is hard-filtered will have a missing value for these annotations.
These related filter annotations are also provided for release filtered samples added to the input ht under a release_relatedness_filters struct:
related: Whether the release filtered sample was filtered for second-degree (or closer) relatedness in the final release.
duplicate_or_twin: Whether the release filtered sample has a duplicate or twin among all samples that are not hard-filtered or outlier-filtered.
parent_child: Whether the release filtered sample has a parent or child among all samples that are not hard-filtered or outlier-filtered.
sibling: Whether the release filtered sample has a sibling among all samples that are not hard-filtered or outlier-filtered.
Any sample in ht that is hard-filtered or outlier-filtered will have a missing value for these annotations.
- Parameters:
ht (
Table
) – Sample QC filter Table to add relatedness filter annotations to.relationship_ht (
Table
) – Table with relationships annotations.hard_filtered_expr (
BooleanExpression
) – Boolean Expression indicating whether the sample was hard filtered.outlier_filtered_expr (
BooleanExpression
) – Boolean Expression indicating whether the sample was outlier filtered.
- Return type:
- Returns:
Table with related filters added and Table with relationship and gnomad v3 overlap information.
- gnomad_qc.v4.sample_qc.create_sample_qc_metadata_ht.add_annotations(base_ht, ann_ht, ann_label, ann_top_level=False, global_top_level=False, base_ht_missing=None, ann_ht_missing=None, sample_count_match=True)[source]
Annotate base_ht with contents of ann_ht and optionally check that sample counts match.
- Parameters:
base_ht (
Table
) – Table to annotate.ann_ht (
Table
) – Table with annotations to add to base_ht.ann_label (
str
) – Label to use for Struct annotation of ann_ht on base_ht if ann_top_level is True. Also used and for logging message describing annotations being added.ann_top_level (
bool
) – Whether to add all annotations on ann_ht to the top level of base_ht instead of grouping them under a new annotation, ann_label.global_top_level (
bool
) – Whether to add all global annotations on ann_ht to the top level instead of grouping them under a new annotation, “ann_label_parameters”.base_ht_missing (
Optional
[List
[str
]]) – Optional list of approved samples missing from base_ht, but present in ann_ht.ann_ht_missing (
Optional
[List
[str
]]) – Optional list of approved samples missing from ann_ht, but present in base_ht.sample_count_match (
bool
) – Check whether the sample counts match in the two input tables. Default is True.
- Return type:
- Returns:
Table with additional annotations.
- gnomad_qc.v4.sample_qc.create_sample_qc_metadata_ht.get_sample_filter_ht(base_ht, relationship_ht)[source]
Combine sample filters into a single Table to be added to the metadata Table.
Includes hard-filters, sample QC outlier filters, and relatedness filters.
- gnomad_qc.v4.sample_qc.create_sample_qc_metadata_ht.get_hard_filter_metric_ht(base_ht)[source]
Combine sample contamination, chr20 mean DP, and QC MT callrate into a single Table.