gnomad_qc.v4.resources.sample_qc

Script containing sample QC related resources.

Module Functions

gnomad_qc.v4.resources.sample_qc.get_sample_qc_root([...])

Return path to sample QC root folder.

gnomad_qc.v4.resources.sample_qc.get_sample_qc([...])

Get sample QC annotations generated by Hail for the specified stratification.

gnomad_qc.v4.resources.sample_qc.get_ploidy_cutoff_json_path([...])

Get the sex karyotype ploidy cutoff JSON path for the indicated gnomAD version.

gnomad_qc.v4.resources.sample_qc.interval_qc_pass([...])

Get the VersionedTableResource for Table with interval QC pass annotation.

gnomad_qc.v4.resources.sample_qc.get_predetermined_qc([...])

Get the dense MatrixTableResource of all predetermined QC sites for the indicated gnomAD version.

gnomad_qc.v4.resources.sample_qc.get_joint_qc([test])

Get the dense MatrixTableResource at final joint v3 and v4 QC sites.

gnomad_qc.v4.resources.sample_qc.get_cuking_input_path([...])

Return the path containing the input files read by cuKING.

gnomad_qc.v4.resources.sample_qc.get_cuking_output_path([...])

Return the path containing the output files written by cuKING.

gnomad_qc.v4.resources.sample_qc.pc_relate_pca_scores([test])

Get VersionedTableResource for PCA scores for use in PC-Relate.

gnomad_qc.v4.resources.sample_qc.relatedness([...])

Get the VersionedTableResource for relatedness results.

gnomad_qc.v4.resources.sample_qc.ibd([test])

Get VersionedTableResource for identity-by-descent (ibd) on cuKING related pairs.

gnomad_qc.v4.resources.sample_qc.related_samples_to_drop([...])

Get the VersionedTableResource for samples to drop for release or ancestry PCA.

gnomad_qc.v4.resources.sample_qc.sample_rankings([...])

Get the VersionedTableResource for sample rankings for release or ancestry PCA.

gnomad_qc.v4.resources.sample_qc.ancestry_pca_loadings([...])

Get the ancestry PCA loadings VersionedTableResource.

gnomad_qc.v4.resources.sample_qc.ancestry_pca_scores([...])

Get the ancestry PCA scores VersionedTableResource.

gnomad_qc.v4.resources.sample_qc.ancestry_pca_eigenvalues([...])

Get the ancestry PCA eigenvalues VersionedTableResource.

gnomad_qc.v4.resources.sample_qc.pop_rf_path([...])

Path to RF model used for inferring sample populations.

gnomad_qc.v4.resources.sample_qc.get_pop_ht([...])

Get the TableResource of samples' inferred population for the indicated gnomAD version.

gnomad_qc.v4.resources.sample_qc.get_pop_pr_ht([...])

Get the TableResource of ancestry inference precision and recall values.

gnomad_qc.v4.resources.sample_qc.per_pop_min_rf_probs_json_path([...])

Get path to JSON file containing per ancestry group minimum RF probabilities.

gnomad_qc.v4.resources.sample_qc.stratified_filtering([...])

Get VersionedTableResource for stratified platform/population-based metrics filtering.

gnomad_qc.v4.resources.sample_qc.regressed_filtering([...])

Get VersionedTableResource for regression platform/population-based metrics filtering.

gnomad_qc.v4.resources.sample_qc.nearest_neighbors([...])

Get VersionedTableResource for population PCA nearest neighbors.

gnomad_qc.v4.resources.sample_qc.nearest_neighbors_filtering([test])

Get VersionedTableResource for nearest neighbors platform/population-based metrics filtering.

gnomad_qc.v4.resources.sample_qc.finalized_outlier_filtering([test])

Get VersionedTableResource for the finalized outlier filtering.

gnomad_qc.v4.resources.sample_qc.duplicates()

Get the VersionedTableResource for duplicated (or twin) samples.

gnomad_qc.v4.resources.sample_qc.pedigree([...])

Get the VersionedPedigreeResource for the trio pedigree including multiple trios per family.

gnomad_qc.v4.resources.sample_qc.trios([...])

Get the VersionedPedigreeResource for finalized trio samples.

gnomad_qc.v4.resources.sample_qc.ped_mendel_errors([test])

Get the VersionedTableResource for the number of mendel errors per trio.

gnomad_qc.v4.resources.sample_qc.ped_filter_param_json_path([...])

Get path to JSON file containing filters used to create the finalized Pedigree and trios resources.

gnomad_qc.v4.resources.sample_qc.get_sample_qc_field_def_json_path([...])

Get path to JSON file containing sample QC metadata HT field definitions.

Script containing sample QC related resources.

gnomad_qc.v4.resources.sample_qc.get_sample_qc_root(version='4.0', test=False, data_type='exomes')[source]

Return path to sample QC root folder.

Parameters:
  • version (str) – Version of sample QC path to return.

  • test (bool) – Whether to use a tmp path for analysis of the test VDS instead of the full v4 VDS.

  • data_type – Data type used in sample QC, e.g. “exomes” or “joint”.

Return type:

str

Returns:

Root to sample QC path.

gnomad_qc.v4.resources.sample_qc.get_sample_qc(strat='all', test=False, data_type='exomes')[source]

Get sample QC annotations generated by Hail for the specified stratification.

Possible values for strat:
  • bi_allelic

  • multi_allelic

  • all

Parameters:
  • strat (str) – Which stratification to return.

  • test (bool) – Whether to use a tmp path for analysis of the test VDS instead of the full v4 VDS.

  • data_type (str) – Data type used in sample QC, e.g. “exomes” or “joint”.

Return type:

VersionedTableResource

Returns:

Sample QC table.

gnomad_qc.v4.resources.sample_qc.get_ploidy_cutoff_json_path(version='4.0', test=False)[source]

Get the sex karyotype ploidy cutoff JSON path for the indicated gnomAD version.

Parameters:
  • version (str) – Version of the JSON to return.

  • test (bool) – Whether to use a tmp path for a test JSON.

Return type:

str

Returns:

Path of sex karyotype ploidy cutoff JSON.

gnomad_qc.v4.resources.sample_qc.interval_qc_pass(per_platform=False, all_platforms=False)[source]

Get the VersionedTableResource for Table with interval QC pass annotation.

Parameters:
  • per_platform (bool) – Whether to use the interval QC pass resource with interval QC pass per platform.

  • all_platforms (bool) – Whether to use the interval QC pass resource where an interval passes QC only if it passes interval QC per platform across all platforms.

Return type:

VersionedTableResource

Returns:

VersionedTableResource for Table with interval QC pass annotation.

gnomad_qc.v4.resources.sample_qc.get_predetermined_qc(version='4.0', test=False)[source]

Get the dense MatrixTableResource of all predetermined QC sites for the indicated gnomAD version.

Parameters:
  • version (str) – Version of QC MatrixTableResource to return.

  • test (bool) – Whether to use a tmp path for a test MatrixTableResource.

Return type:

MatrixTableResource

Returns:

MatrixTableResource of predetermined QC sites.

gnomad_qc.v4.resources.sample_qc.get_joint_qc(test=False)[source]

Get the dense MatrixTableResource at final joint v3 and v4 QC sites.

Parameters:

test (bool) – Whether to use a tmp path for a test resource.

Return type:

VersionedMatrixTableResource

Returns:

MatrixTableResource of QC sites.

gnomad_qc.v4.resources.sample_qc.get_cuking_input_path(version='4.0', test=False)[source]

Return the path containing the input files read by cuKING.

Those files correspond to Parquet tables derived from the dense QC matrix.

Parameters:
  • version (str) – gnomAD version.

  • test (bool) – Whether to return a path corresponding to a test subset.

Return type:

str

Returns:

Temporary path to hold Parquet input tables for running cuKING.

gnomad_qc.v4.resources.sample_qc.get_cuking_output_path(version='4.0', test=False)[source]

Return the path containing the output files written by cuKING.

Those files correspond to Parquet tables containing relatedness results.

Parameters:
  • version (str) – gnomAD version.

  • test (bool) – Whether to return a path corresponding to a test subset.

Return type:

str

Returns:

Temporary path to hold Parquet output tables for running cuKING.

gnomad_qc.v4.resources.sample_qc.pc_relate_pca_scores(test=False)[source]

Get VersionedTableResource for PCA scores for use in PC-Relate.

Parameters:

test (bool) – Whether to use a tmp path for a test resource.

Return type:

VersionedTableResource

Returns:

VersionedTableResource.

gnomad_qc.v4.resources.sample_qc.relatedness(method=None, test=False)[source]

Get the VersionedTableResource for relatedness results.

Parameters:
  • method (Optional[str]) – Optional method of relatedness inference to return VersionedTableResource for. One of ‘cuking’ or ‘pc_relate’ if set. Default is None, which will return the finalized relatedness Table.

  • test (bool) – Whether to use a tmp path for a test resource.

Return type:

VersionedTableResource

Returns:

VersionedTableResource.

gnomad_qc.v4.resources.sample_qc.ibd(test=False)[source]

Get VersionedTableResource for identity-by-descent (ibd) on cuKING related pairs.

Parameters:

test (bool) – Whether to use a tmp path for a test resource.

Return type:

VersionedTableResource

Returns:

VersionedTableResource.

gnomad_qc.v4.resources.sample_qc.related_samples_to_drop(test=False, release=True)[source]

Get the VersionedTableResource for samples to drop for release or ancestry PCA.

Default to returning the VersionedTableResource for samples to drop for release. If release is set to False, retrieve the VersionedTableResource of related samples to remove for ancestry PCA.

Parameters:
  • test (bool) – Whether to use a tmp path for a test resource.

  • release (bool) – Whether to return resource for related samples to drop for the release based on outlier filtering of sample QC metrics.

Return type:

VersionedTableResource

Returns:

VersionedTableResource.

gnomad_qc.v4.resources.sample_qc.sample_rankings(test=False, release=True)[source]

Get the VersionedTableResource for sample rankings for release or ancestry PCA.

Default to returning the VersionedTableResource for release sample rankings. If release is set to False, retrieve the VersionedTableResource of sample rankings for removing relateds for PCA.

Parameters:
  • test (bool) – Whether to use a tmp path for a test resource.

  • release (bool) – Whether to return resource for ranking of all samples based on outlier filtering of sample QC metrics. Used to determine related samples to drop for the release.

Return type:

VersionedTableResource

Returns:

VersionedTableResource.

gnomad_qc.v4.resources.sample_qc.ancestry_pca_loadings(include_unreleasable_samples=False, test=False, data_type='joint')[source]

Get the ancestry PCA loadings VersionedTableResource.

Parameters:
  • include_unreleasable_samples (bool) – Whether to get the PCA loadings from the PCA that used unreleasable samples.

  • test (bool) – Whether to use a temp path.

  • data_type (str) – Data type used in sample QC, e.g. “exomes” or “joint”

Return type:

VersionedTableResource

Returns:

Ancestry PCA loadings

gnomad_qc.v4.resources.sample_qc.ancestry_pca_scores(include_unreleasable_samples=False, test=False, data_type='joint')[source]

Get the ancestry PCA scores VersionedTableResource.

Parameters:
  • include_unreleasable_samples (bool) – Whether to get the PCA scores from the PCA that used unreleasable samples.

  • test (bool) – Whether to use a temp path.

  • data_type (str) – Data type used in sample QC, e.g. “exomes” or “joint”.

Return type:

VersionedTableResource

Returns:

Ancestry PCA scores.

gnomad_qc.v4.resources.sample_qc.ancestry_pca_eigenvalues(include_unreleasable_samples=False, test=False, data_type='joint')[source]

Get the ancestry PCA eigenvalues VersionedTableResource.

Parameters:
  • include_unreleasable_samples (bool) – Whether to get the PCA eigenvalues from the PCA that used unreleasable samples.

  • test (bool) – Whether to use a temp path.

  • data_type (str) – Data type used in sample QC, e.g. “exomes” or “joint”.

Return type:

VersionedTableResource

Returns:

Ancestry PCA eigenvalues.

gnomad_qc.v4.resources.sample_qc.pop_rf_path(version='4.0', test=False, data_type='joint')[source]

Path to RF model used for inferring sample populations.

Parameters:
  • version (str) – gnomAD Version.

  • test (bool) – Whether the RF assignment was from a test dataset.

  • data_type (str) – Data type used in sample QC, e.g. “exomes” or “joint”.

Return type:

str

Returns:

String path to sample pop RF model.

gnomad_qc.v4.resources.sample_qc.get_pop_ht(version='4.0', test=False, data_type='joint')[source]

Get the TableResource of samples’ inferred population for the indicated gnomAD version.

Parameters:
  • version (str) – Version of pop TableResource to return.

  • test (bool) – Whether to use the test version of the pop TableResource.

  • data_type (str) – Data type used in sample QC, e.g. “exomes” or “joint”.

Returns:

TableResource of sample pops.

gnomad_qc.v4.resources.sample_qc.get_pop_pr_ht(version='4.0', test=False, data_type='joint')[source]

Get the TableResource of ancestry inference precision and recall values.

Parameters:
  • version (str) – Version of pop PR TableResource to return.

  • test (bool) – Whether to use the test version of the pop PR TableResource.

  • data_type (str) – Data type used in sample QC, e.g. “exomes” or “joint”.

Returns:

TableResource of ancestry inference PR values.

gnomad_qc.v4.resources.sample_qc.per_pop_min_rf_probs_json_path(version='4.0')[source]

Get path to JSON file containing per ancestry group minimum RF probabilities.

Parameters:

version (str) – Version of the JSON to return.

Returns:

Path to per ancestry group minimum RF probabilities JSON.

gnomad_qc.v4.resources.sample_qc.stratified_filtering(test=False, pop_stratified=False, platform_stratified=False)[source]

Get VersionedTableResource for stratified platform/population-based metrics filtering.

Parameters:
  • test (bool) – Whether to use a tmp path for a test resource.

  • pop_stratified (bool) – Whether to get resource that includes population stratification in stratified outlier filtering.

  • platform_stratified (bool) – Whether to get resource that includes platform stratification in stratified outlier filtering.

Return type:

VersionedTableResource

Returns:

VersionedTableResource.

gnomad_qc.v4.resources.sample_qc.regressed_filtering(test=False, pop_pc_regressed=False, platform_pc_regressed=False, platform_stratified=False, include_unreleasable_samples=False)[source]

Get VersionedTableResource for regression platform/population-based metrics filtering.

Parameters:
  • test (bool) – Whether to use a tmp path for a test resource.

  • pop_pc_regressed (bool) – Whether to get resource that includes population PCs in regression filtering.

  • platform_pc_regressed (bool) – Whether to get resource that includes platform PCs in regression filtering.

  • platform_stratified (bool) – Whether to get resource that includes platform stratification in regression filtering.

  • include_unreleasable_samples (bool) – Whether the PCA included unreleasable samples.

Return type:

VersionedTableResource

Returns:

VersionedTableResource.

gnomad_qc.v4.resources.sample_qc.nearest_neighbors(test=False, platform_stratified=False, approximation=False, include_unreleasable_samples=False)[source]

Get VersionedTableResource for population PCA nearest neighbors.

Parameters:
  • test (bool) – Whether to use a tmp path for a test resource.

  • platform_stratified (bool) – Whether to get resource that includes platform stratified nearest neighbors.

  • approximation (bool) – Whether to get resource that is approximate nearest neighbors.

  • include_unreleasable_samples (bool) – Whether to get resource that included unreleasable samples in nearest neighbors determination.

Return type:

VersionedTableResource

Returns:

VersionedTableResource.

gnomad_qc.v4.resources.sample_qc.nearest_neighbors_filtering(test=False)[source]

Get VersionedTableResource for nearest neighbors platform/population-based metrics filtering.

Parameters:

test (bool) – Whether to use a tmp path for a test resource.

Return type:

VersionedTableResource

Returns:

VersionedTableResource.

gnomad_qc.v4.resources.sample_qc.finalized_outlier_filtering(test=False)[source]

Get VersionedTableResource for the finalized outlier filtering.

Parameters:

test (bool) – Whether to use a tmp path for a test resource.

Return type:

VersionedTableResource

Returns:

VersionedTableResource.

gnomad_qc.v4.resources.sample_qc.duplicates()[source]

Get the VersionedTableResource for duplicated (or twin) samples.

Return type:

VersionedTableResource

Returns:

VersionedTableResource of duplicate samples.

gnomad_qc.v4.resources.sample_qc.pedigree(finalized=True, fake=False, test=False)[source]

Get the VersionedPedigreeResource for the trio pedigree including multiple trios per family.

Parameters:
  • finalized (bool) – Whether to return the finalized pedigree resource.

  • fake (bool) – Whether to return the fake pedigree resource.

  • test (bool) – Whether to use a tmp path for a test resource. This is only an option for the finalized pedigree, which depends on ped_mendel_errors.

Return type:

VersionedPedigreeResource

Returns:

VersionedPedigreeResource of trio pedigree including multiple trios per family.

gnomad_qc.v4.resources.sample_qc.trios(fake=False, test=False)[source]

Get the VersionedPedigreeResource for finalized trio samples.

Parameters:
  • fake (bool) – Whether to return the fake trio resource.

  • test (bool) – Whether to use a tmp path for a test resource. This is only an option for the finalized Pedigree, which depends on ped_mendel_errors.

Return type:

VersionedPedigreeResource

Returns:

VersionedPedigreeResource of trio samples.

gnomad_qc.v4.resources.sample_qc.ped_mendel_errors(test=False)[source]

Get the VersionedTableResource for the number of mendel errors per trio.

Parameters:

test (bool) – Whether to use a tmp path for a test resource.

Return type:

VersionedTableResource

Returns:

VersionedTableResource of number of mendel errors per trio.

gnomad_qc.v4.resources.sample_qc.ped_filter_param_json_path(version='4.0', test=False)[source]

Get path to JSON file containing filters used to create the finalized Pedigree and trios resources.

Parameters:
  • version (str) – Version of the JSON to return.

  • test (bool) – Whether to use a tmp path for a test resource.

Returns:

Path to Pedigree filter JSON.

gnomad_qc.v4.resources.sample_qc.get_sample_qc_field_def_json_path(version='4.0')[source]

Get path to JSON file containing sample QC metadata HT field definitions.

Parameters:

version (str) – gnomAD version.

Return type:

str

Returns:

Path to sample QC field definitions JSON.