gnomad.sample_qc.platform

gnomad.sample_qc.platform.compute_callrate_mt(mt, ...)

Compute a sample/interval MT with each entry containing the call rate for that sample/interval.

gnomad.sample_qc.platform.run_platform_pca(...)

Run PCA on a sample/interval MT with each entry containing the call rate.

gnomad.sample_qc.platform.assign_platform_from_pcs(...)

Assign platforms using HBDSCAN on the results of call rate PCA.

gnomad.sample_qc.platform.compute_callrate_mt(mt, intervals_ht, bi_allelic_only=True, autosomes_only=True, match=True)[source]

Compute a sample/interval MT with each entry containing the call rate for that sample/interval.

This can be used as input for imputing exome sequencing platforms.

Note

The input interval HT should have a key of type Interval. The resulting table will have a key of the same type as the intervals_ht table and contain an interval_info field containing all non-key fields of the intervals_ht.

Parameters:
  • mt (MatrixTable) – Input MT

  • intervals_ht (Table) – Table containing the intervals. This table has to be keyed by locus.

  • bi_allelic_only (bool) – If set, only bi-allelic sites are used for the computation

  • autosomes_only (bool) – If set, only autosomal intervals are used.

  • matches – If set, returns all intervals in intervals_ht that overlap the locus in the input MT.

  • match (bool) –

Return type:

MatrixTable

Returns:

Callrate MT

gnomad.sample_qc.platform.run_platform_pca(callrate_mt, binarization_threshold=0.25, n_pcs=10)[source]

Run PCA on a sample/interval MT with each entry containing the call rate.

When binzarization_threshold is set, the callrate is transformed to a 0/1 value based on the threshold. E.g. with the default threshold of 0.25, all entries with a callrate < 0.25 are considered as 0s, others as 1s.

Parameters:
  • callrate_mt (MatrixTable) – Input callrate MT

  • binarization_threshold (Optional[float]) – binzarization_threshold. None is no threshold desired

  • n_pcs (int) – Number of PCs to compute

Return type:

Tuple[List[float], Table, Table]

Returns:

eigenvalues, scores_ht, loadings_ht

gnomad.sample_qc.platform.assign_platform_from_pcs(platform_pca_scores_ht, pc_scores_ann='scores', hdbscan_min_cluster_size=None, hdbscan_min_samples=None)[source]

Assign platforms using HBDSCAN on the results of call rate PCA.

Parameters:
  • platform_pca_scores_ht (Table) – Input table with the PCA score for each sample

  • pc_scores_ann (str) – Field containing the scores

  • hdbscan_min_cluster_size (Optional[int]) – HDBSCAN min_cluster_size parameter. If not specified the smallest of 500 and 0.1*n_samples will be used.

  • hdbscan_min_samples (int) – HDBSCAN min_samples parameter

Return type:

Table

Returns:

A Table with a qc_platform annotation containing the platform based on HDBSCAN clustering