gnomad_qc.v4.sample_qc.generate_qc_mt

Script to create a dense MatrixTable filtered to a diverse set of variants for relatedness/ancestry PCA using CCDG, gnomAD v3, and UK Biobank.

The variant set was determined using this script: https://github.com/Nealelab/ccdg_qc/blob/master/scripts/pca_variant_filter.py

The full gnomAD v4 VariantDataset and v4 sparse MatrixTable are filtered to these sites and then densified. The resulting MatrixTables are merged and additional allele frequency and callrate filters are applied, then LD-pruning is performed.

Additionally, the script creates a joint v3 and v4 metadata file for use in relatedness/ancestry PCA.

usage: gnomad_qc.v4.sample_qc.generate_qc_mt.py [-h] [--test]
                                                [--create-v3-filtered-dense-mt]
                                                [--create-v4-filtered-dense-mt]
                                                [--generate-qc-mt]
                                                [--bi-allelic-only]
                                                [--min-af MIN_AF]
                                                [--min-callrate MIN_CALLRATE]
                                                [--min-inbreeding-coeff-threshold MIN_INBREEDING_COEFF_THRESHOLD]
                                                [--ld-r2 LD_R2]
                                                [--n-partitions N_PARTITIONS]
                                                [--block-size BLOCK_SIZE]
                                                [--generate-qc-meta] [-o]
                                                [--slack-channel SLACK_CHANNEL]

Named Arguments

--test

Runs a test on two partitions of the VDS.

Default: False

--create-v3-filtered-dense-mt

Create a dense MatrixTable from the raw gnomAD v3.1 sparse MatrixTable filtered to predetermined QC variants.

Default: False

--create-v4-filtered-dense-mt

Create a dense MatrixTable from the raw gnomAD v4 VariantDataset filtered to predetermined QC variants.

Default: False

--generate-qc-mt

Create the final merged gnomAD v3 + v4 QC MatrixTable with all specified filters and LD-pruning.

Default: False

--bi-allelic-only

Filter to variants that are bi-allelic.

Default: False

--min-af

Minimum variant allele frequency to retain variant in QC MatrixTable.

Default: 0.0001

--min-callrate

Minimum variant callrate to retain variant in QC MatrixTable.

Default: 0.99

--min-inbreeding-coeff-threshold

Minimum site inbreeding coefficient to retain variant in QC MatrixTable.

Default: -0.8

--ld-r2

LD-pruning cutoff.

Default: 0.1

--n-partitions

Desired number of partitions for output QC MatrixTable.

Default: 5000

--block-size

Block size parameter to use for LD pruning.

Default: 2048

--generate-qc-meta

Create a merged gnomAD v3 + v4 metadata Table for QC purposes.

Default: False

-o, --overwrite

Overwrite all data from this subset.

Default: False

--slack-channel

Slack channel to post results and notifications to.

Module Functions

gnomad_qc.v4.sample_qc.generate_qc_mt.create_filtered_dense_mt(mtds)

Filter a sparse MatrixTable or VariantDataset to a set of predetermined QC sites and return a dense MatrixTable.

gnomad_qc.v4.sample_qc.generate_qc_mt.generate_qc_mt(...)

Generate combined gnomAD v3 and v4 QC MatrixTable for use in relatedness and ancestry inference.

gnomad_qc.v4.sample_qc.generate_qc_mt.generate_qc_meta_ht()

Combine v3 and v4 sample metadata into a single Table for relatedness and population inference.

gnomad_qc.v4.sample_qc.generate_qc_mt.main(args)

Create a dense MT of a diverse set of variants for relatedness/ancestry PCA.

gnomad_qc.v4.sample_qc.generate_qc_mt.get_script_argument_parser()

Get script argument parser.

Script to create a dense MatrixTable filtered to a diverse set of variants for relatedness/ancestry PCA using CCDG, gnomAD v3, and UK Biobank.

The variant set was determined using this script: https://github.com/Nealelab/ccdg_qc/blob/master/scripts/pca_variant_filter.py

The full gnomAD v4 VariantDataset and v4 sparse MatrixTable are filtered to these sites and then densified. The resulting MatrixTables are merged and additional allele frequency and callrate filters are applied, then LD-pruning is performed.

Additionally, the script creates a joint v3 and v4 metadata file for use in relatedness/ancestry PCA.

gnomad_qc.v4.sample_qc.generate_qc_mt.create_filtered_dense_mt(mtds, split=False)[source]

Filter a sparse MatrixTable or VariantDataset to a set of predetermined QC sites and return a dense MatrixTable.

Parameters:
  • mtds (Union[VariantDataset, MatrixTable]) – Input MatrixTable or VariantDataset.

  • split (bool) – Whether mtds should have multi-allelics split before filtering variants.

Return type:

MatrixTable

Returns:

Filtered and densified MatrixTable.

gnomad_qc.v4.sample_qc.generate_qc_mt.generate_qc_mt(v3_mt, v4_mt, bi_allelic_only=False, min_af=0.0001, min_callrate=0.99, min_inbreeding_coeff_threshold=-0.8, ld_r2=0.1, n_partitions=1000, block_size=2048)[source]

Generate combined gnomAD v3 and v4 QC MatrixTable for use in relatedness and ancestry inference.

Parameters:
  • v3_mt (MatrixTable) – Dense gnomAD v3 MatrixTable filtered to predetermined sites.

  • v4_mt (MatrixTable) – Dense gnomAD v4 MatrixTable filtered to predetermined sites.

  • bi_allelic_only (bool) – Whether to filter to bi-allelic variants.

  • min_af (float) – Minimum variant allele frequency to retain variant in QC MatrixTable.

  • min_callrate (float) – Minimum variant callrate to retain variant in QC MatrixTable.

  • min_inbreeding_coeff_threshold (float) – Minimum site inbreeding coefficient to retain variant in QC MatrixTable.

  • ld_r2 (float) – LD-pruning cutoff.

  • n_partitions (int) – Number of partitions to repartition the MT to before LD pruning.

  • block_size (int) – Block size parameter to use for LD pruning.

Return type:

MatrixTable

Returns:

MatrixTable of sites that pass QC filters.

gnomad_qc.v4.sample_qc.generate_qc_mt.generate_qc_meta_ht()[source]

Combine v3 and v4 sample metadata into a single Table for relatedness and population inference.

Return type:

Table

Returns:

Table with v3 and v4 sample metadata

gnomad_qc.v4.sample_qc.generate_qc_mt.main(args)[source]

Create a dense MT of a diverse set of variants for relatedness/ancestry PCA.