gnomad_qc.v4.sample_qc.create_sample_qc_metadata_ht_genomes

Script to create sample QC metadata HT for genomes.

The main updates from v3.1 to v4.0 are due to the following updates in the HGDP + TGP subset meta:

  • 169 samples added

  • 110 samples removed

The 169 samples added were diverse HGDP/TGP samples in v3.1 that were unintentionally removed during sample outlier detection.

The 110 samples were removed due to the following reasons:

  • Contamination flagged by CHARR (CHARR score stored in freemix column).

  • Subcontinental PCA outliers within the subset.

  • Updated relatedness results within the subset, re-evaluation of relatedness inference of the subset to the rest of gnomAD release samples, and removal of any samples that are duplicates of v4.0 exomes release samples.

All samples impacted (169 added, 110 released) are releasable; the updates we made here were to their sample QC status (high quality vs filtered).

usage: gnomad_qc.v4.sample_qc.create_sample_qc_metadata_ht_genomes.py
       [-h] [--overwrite] [--slack-channel SLACK_CHANNEL]

Named Arguments

--overwrite

Overwrite the existing meta HT.

Default: False

--slack-channel

Slack channel to post results and notifications to.

Module Functions

gnomad_qc.v4.sample_qc.create_sample_qc_metadata_ht_genomes.N_DIFF_SAMPLES

The number of samples that have been updated between v3.1 and v4.0.

gnomad_qc.v4.sample_qc.create_sample_qc_metadata_ht_genomes.import_updated_annotations(ht, ...)

Import updated annotations from HGDP/TGP subset meta HT and annotate full meta HT.

gnomad_qc.v4.sample_qc.create_sample_qc_metadata_ht_genomes.main(args)

Create updated v4 genomes sample QC metadata HT.

gnomad_qc.v4.sample_qc.create_sample_qc_metadata_ht_genomes.get_script_argument_parser()

Get script argument parser.

Script to create sample QC metadata HT for genomes.

The main updates from v3.1 to v4.0 are due to the following updates in the HGDP + TGP subset meta:

  • 169 samples added

  • 110 samples removed

The 169 samples added were diverse HGDP/TGP samples in v3.1 that were unintentionally removed during sample outlier detection.

The 110 samples were removed due to the following reasons:

  • Contamination flagged by CHARR (CHARR score stored in freemix column).

  • Subcontinental PCA outliers within the subset.

  • Updated relatedness results within the subset, re-evaluation of relatedness inference of the subset to the rest of gnomAD release samples, and removal of any samples that are duplicates of v4.0 exomes release samples.

All samples impacted (169 added, 110 released) are releasable; the updates we made here were to their sample QC status (high quality vs filtered).

gnomad_qc.v4.sample_qc.create_sample_qc_metadata_ht_genomes.N_DIFF_SAMPLES = 59

The number of samples that have been updated between v3.1 and v4.0.

gnomad_qc.v4.sample_qc.create_sample_qc_metadata_ht_genomes.import_updated_annotations(ht, subset_ht)[source]

Import updated annotations from HGDP/TGP subset meta HT and annotate full meta HT.

Parameters:
  • ht (Table) – v3.1 genomes meta HT

  • subset_ht (Table) – Updated HGDP + TGP subset meta HT

Return type:

Table

Returns:

Updated v4.0 genomes meta HT

gnomad_qc.v4.sample_qc.create_sample_qc_metadata_ht_genomes.main(args)[source]

Create updated v4 genomes sample QC metadata HT.