gnomad.sample_qc.platform
Compute a sample/interval MT with each entry containing the call rate for that sample/interval. |
|
Run PCA on a sample/interval MT with each entry containing the call rate. |
|
Assign platforms using HBDSCAN on the results of call rate PCA. |
- gnomad.sample_qc.platform.compute_callrate_mt(mt, intervals_ht, bi_allelic_only=True, autosomes_only=True, match=True)[source]
Compute a sample/interval MT with each entry containing the call rate for that sample/interval.
This can be used as input for imputing exome sequencing platforms.
Note
The input interval HT should have a key of type Interval. The resulting table will have a key of the same type as the intervals_ht table and contain an interval_info field containing all non-key fields of the intervals_ht.
- Parameters:
mt (
MatrixTable
) – Input MTintervals_ht (
Table
) – Table containing the intervals. This table has to be keyed by locus.bi_allelic_only (
bool
) – If set, only bi-allelic sites are used for the computationautosomes_only (
bool
) – If set, only autosomal intervals are used.matches – If set, returns all intervals in intervals_ht that overlap the locus in the input MT.
match (
bool
) –
- Return type:
- Returns:
Callrate MT
- gnomad.sample_qc.platform.run_platform_pca(callrate_mt, binarization_threshold=0.25, n_pcs=10)[source]
Run PCA on a sample/interval MT with each entry containing the call rate.
When binzarization_threshold is set, the callrate is transformed to a 0/1 value based on the threshold. E.g. with the default threshold of 0.25, all entries with a callrate < 0.25 are considered as 0s, others as 1s.
- Parameters:
callrate_mt (
MatrixTable
) – Input callrate MTbinarization_threshold (
Optional
[float
]) – binzarization_threshold. None is no threshold desiredn_pcs (
int
) – Number of PCs to compute
- Return type:
- Returns:
eigenvalues, scores_ht, loadings_ht
- gnomad.sample_qc.platform.assign_platform_from_pcs(platform_pca_scores_ht, pc_scores_ann='scores', hdbscan_min_cluster_size=None, hdbscan_min_samples=None)[source]
Assign platforms using HBDSCAN on the results of call rate PCA.
- Parameters:
platform_pca_scores_ht (
Table
) – Input table with the PCA score for each samplepc_scores_ann (
str
) – Field containing the scoreshdbscan_min_cluster_size (
Optional
[int
]) – HDBSCAN min_cluster_size parameter. If not specified the smallest of 500 and 0.1*n_samples will be used.hdbscan_min_samples (
int
) – HDBSCAN min_samples parameter
- Return type:
- Returns:
A Table with a qc_platform annotation containing the platform based on HDBSCAN clustering