gnomad_qc.v4.sample_qc.generate_qc_mt
Script to create a dense MatrixTable filtered to a diverse set of variants for relatedness/ancestry PCA using CCDG, gnomAD v3, and UK Biobank.
The variant set was determined using this script: https://github.com/Nealelab/ccdg_qc/blob/master/scripts/pca_variant_filter.py
The full gnomAD v4 VariantDataset and v4 sparse MatrixTable are filtered to these sites and then densified. The resulting MatrixTables are merged and additional allele frequency and callrate filters are applied, then LD-pruning is performed.
Additionally, the script creates a joint v3 and v4 metadata file for use in relatedness/ancestry PCA.
usage: gnomad_qc.v4.sample_qc.generate_qc_mt.py [-h] [--test]
[--create-v3-filtered-dense-mt]
[--create-v4-filtered-dense-mt]
[--generate-qc-mt]
[--bi-allelic-only]
[--min-af MIN_AF]
[--min-callrate MIN_CALLRATE]
[--min-inbreeding-coeff-threshold MIN_INBREEDING_COEFF_THRESHOLD]
[--ld-r2 LD_R2]
[--n-partitions N_PARTITIONS]
[--block-size BLOCK_SIZE]
[--generate-qc-meta] [-o]
[--slack-channel SLACK_CHANNEL]
Named Arguments
- --test
Runs a test on two partitions of the VDS.
Default: False
- --create-v3-filtered-dense-mt
Create a dense MatrixTable from the raw gnomAD v3.1 sparse MatrixTable filtered to predetermined QC variants.
Default: False
- --create-v4-filtered-dense-mt
Create a dense MatrixTable from the raw gnomAD v4 VariantDataset filtered to predetermined QC variants.
Default: False
- --generate-qc-mt
Create the final merged gnomAD v3 + v4 QC MatrixTable with all specified filters and LD-pruning.
Default: False
- --bi-allelic-only
Filter to variants that are bi-allelic.
Default: False
- --min-af
Minimum variant allele frequency to retain variant in QC MatrixTable.
Default: 0.0001
- --min-callrate
Minimum variant callrate to retain variant in QC MatrixTable.
Default: 0.99
- --min-inbreeding-coeff-threshold
Minimum site inbreeding coefficient to retain variant in QC MatrixTable.
Default: -0.8
- --ld-r2
LD-pruning cutoff.
Default: 0.1
- --n-partitions
Desired number of partitions for output QC MatrixTable.
Default: 5000
- --block-size
Block size parameter to use for LD pruning.
Default: 2048
- --generate-qc-meta
Create a merged gnomAD v3 + v4 metadata Table for QC purposes.
Default: False
- -o, --overwrite
Overwrite all data from this subset.
Default: False
- --slack-channel
Slack channel to post results and notifications to.
Module Functions
|
Filter a sparse MatrixTable or VariantDataset to a set of predetermined QC sites and return a dense MatrixTable. |
Generate combined gnomAD v3 and v4 QC MatrixTable for use in relatedness and ancestry inference. |
|
Combine v3 and v4 sample metadata into a single Table for relatedness and population inference. |
|
Create a dense MT of a diverse set of variants for relatedness/ancestry PCA. |
|
|
Get script argument parser. |
Script to create a dense MatrixTable filtered to a diverse set of variants for relatedness/ancestry PCA using CCDG, gnomAD v3, and UK Biobank.
The variant set was determined using this script: https://github.com/Nealelab/ccdg_qc/blob/master/scripts/pca_variant_filter.py
The full gnomAD v4 VariantDataset and v4 sparse MatrixTable are filtered to these sites and then densified. The resulting MatrixTables are merged and additional allele frequency and callrate filters are applied, then LD-pruning is performed.
Additionally, the script creates a joint v3 and v4 metadata file for use in relatedness/ancestry PCA.
- gnomad_qc.v4.sample_qc.generate_qc_mt.create_filtered_dense_mt(mtds, split=False)[source]
Filter a sparse MatrixTable or VariantDataset to a set of predetermined QC sites and return a dense MatrixTable.
- Parameters:
mtds (
Union
[VariantDataset
,MatrixTable
]) – Input MatrixTable or VariantDataset.split (
bool
) – Whether mtds should have multi-allelics split before filtering variants.
- Return type:
- Returns:
Filtered and densified MatrixTable.
- gnomad_qc.v4.sample_qc.generate_qc_mt.generate_qc_mt(v3_mt, v4_mt, bi_allelic_only=False, min_af=0.0001, min_callrate=0.99, min_inbreeding_coeff_threshold=-0.8, ld_r2=0.1, n_partitions=1000, block_size=2048)[source]
Generate combined gnomAD v3 and v4 QC MatrixTable for use in relatedness and ancestry inference.
- Parameters:
v3_mt (
MatrixTable
) – Dense gnomAD v3 MatrixTable filtered to predetermined sites.v4_mt (
MatrixTable
) – Dense gnomAD v4 MatrixTable filtered to predetermined sites.bi_allelic_only (
bool
) – Whether to filter to bi-allelic variants.min_af (
float
) – Minimum variant allele frequency to retain variant in QC MatrixTable.min_callrate (
float
) – Minimum variant callrate to retain variant in QC MatrixTable.min_inbreeding_coeff_threshold (
float
) – Minimum site inbreeding coefficient to retain variant in QC MatrixTable.ld_r2 (
float
) – LD-pruning cutoff.n_partitions (
int
) – Number of partitions to repartition the MT to before LD pruning.block_size (
int
) – Block size parameter to use for LD pruning.
- Return type:
- Returns:
MatrixTable of sites that pass QC filters.