gnomad_qc.v4.resources.release

Script containing release related resources.

Module Functions

gnomad_qc.v4.resources.release.annotation_hists_params_path([...])

Return path to file containing dictionary of parameters for site metric histograms.

gnomad_qc.v4.resources.release.qual_hists_json_path([...])

Fetch filepath for qual histograms JSON.

gnomad_qc.v4.resources.release.release_ht_path([...])

Fetch filepath for release (variant-only) Hail Tables.

gnomad_qc.v4.resources.release.release_sites([...])

Retrieve versioned resource for sites-only release Table.

gnomad_qc.v4.resources.release.release_vcf_path([...])

Fetch bucket for release (sites-only) VCFs.

gnomad_qc.v4.resources.release.release_header_path([...])

Fetch path to pickle file containing VCF header dictionary.

gnomad_qc.v4.resources.release.append_to_vcf_header_path([...])

Fetch path to TSV file containing extra fields to append to VCF header.

gnomad_qc.v4.resources.release.release_coverage_path([...])

Fetch filepath for all sites coverage or allele number release Table.

gnomad_qc.v4.resources.release.release_coverage_tsv_path([...])

Fetch path to coverage TSV file.

gnomad_qc.v4.resources.release.release_all_sites_an_tsv_path([...])

Fetch path to all sites AN TSV file.

gnomad_qc.v4.resources.release.release_coverage([...])

Retrieve versioned resource for coverage release Table.

gnomad_qc.v4.resources.release.release_all_sites_an([...])

Retrieve versioned resource for all sites allele number release Table.

gnomad_qc.v4.resources.release.included_datasets_json_path([...])

Fetch filepath for the JSON containing all datasets used in the release.

gnomad_qc.v4.resources.release.validated_release_ht([...])

Retrieve versioned resource for validated sites-only release Table.

gnomad_qc.v4.resources.release.get_freq_array_readme([...])

Fetch README for freq array for the specified data_type.

gnomad_qc.v4.resources.release.get_false_dup_genes_path([...])

Retrieve path for the liftover table containing three genes of interest within a false duplication in GRCh38.

Script containing release related resources.

gnomad_qc.v4.resources.release.annotation_hists_params_path(release_version='4.1', data_type='exomes')[source]

Return path to file containing dictionary of parameters for site metric histograms.

The keys of the dictionary are the names of the site quality metrics while the values are: [lower bound, upper bound, number of bins]. For example, “InbreedingCoeff”: [-0.25, 0.25, 50].

Parameters:
  • release_version (str) – Release version. Defaults to CURRENT RELEASE.

  • data_type (str) – Data type of annotation resource. e.g. “exomes” or “genomes”. Default is “exomes”.

  • test – Whether to use a tmp path for testing. Default is False.

Return type:

str

Returns:

Path to file with annotation histograms

gnomad_qc.v4.resources.release.qual_hists_json_path(release_version='4.1', data_type='exomes', test=False)[source]

Fetch filepath for qual histograms JSON.

Parameters:
  • release_version (str) – Release version. Defaults to CURRENT RELEASE

  • data_type (str) – Data type ‘exomes’ or ‘genomes’. Default is ‘exomes’.

  • test (bool) – Whether to use a tmp path for testing. Default is False.

Return type:

str

Returns:

File path for histogram JSON

gnomad_qc.v4.resources.release.release_ht_path(data_type='exomes', release_version='4.1', public=False, test=False)[source]

Fetch filepath for release (variant-only) Hail Tables.

Parameters:
  • data_type (str) – Data type of release resource to return. Should be one of ‘exomes’, ‘genomes’ or ‘joint’. Default is ‘exomes’.

  • release_version (str) – Release version. Default is CURRENT_RELEASE.

  • public (bool) – Whether release sites Table path returned is from public or private bucket. Default is False.

  • test (bool) – Whether to use a tmp path for testing. Default is False.

Return type:

str

Returns:

File path for desired release Hail Table.

gnomad_qc.v4.resources.release.release_sites(data_type='exomes', public=False, test=False)[source]

Retrieve versioned resource for sites-only release Table.

Parameters:
  • data_type (str) – Data type of release resource to return. Should be one of ‘exomes’, ‘genomes’, or ‘joint’. Default is ‘exomes’.

  • public (bool) – Whether release sites Table path returned is from public or private bucket. Default is False.

  • test (bool) – Whether to use a tmp path for testing. Default is False.

Return type:

VersionedTableResource

Returns:

Sites-only release Table.

gnomad_qc.v4.resources.release.release_vcf_path(release_version=None, test=False, data_type='exomes', contig=None)[source]

Fetch bucket for release (sites-only) VCFs.

Parameters:
  • release_version (Optional[str]) – Release version. When no release_version is supplied CURRENT_RELEASE is used.

  • test (bool) – Whether to use a tmp path for testing. Default is False.

  • data_type (str) – Data type of release resource to return. Should be one of ‘exomes’ or ‘genomes’. Default is ‘exomes’.

  • contig (Optional[str]) – String containing the name of the desired reference contig. Default is the full (all contigs) sites VCF path.

Return type:

str

Returns:

Filepath for the desired VCF.

gnomad_qc.v4.resources.release.release_header_path(release_version=None, data_type='exomes', test=False)[source]

Fetch path to pickle file containing VCF header dictionary.

Parameters:
  • release_version (Optional[str]) – Release version. When no release_version is supplied CURRENT_RELEASE is used

  • data_type (str) – Data type of release resource to return. Should be one of ‘exomes’ or ‘genomes’. Default is ‘exomes’.

  • test (bool) – Whether to use a tmp path for testing. Default is False.

Return type:

str

Returns:

Filepath for header dictionary pickle.

gnomad_qc.v4.resources.release.append_to_vcf_header_path(subset=None, release_version='4.1', data_type='exomes')[source]

Fetch path to TSV file containing extra fields to append to VCF header.

Extra fields are VEP and dbSNP versions.

Parameters:
  • subset (str) – One of the possible release subsets.

  • release_version (str) – Release version. Defaults to CURRENT RELEASE.

  • data_type (str) – Data type of release resource to return. Should be one of . ‘exomes’ or ‘genomes’. Default is ‘exomes’.

Return type:

str

Returns:

Filepath for extra fields TSV file.

gnomad_qc.v4.resources.release.release_coverage_path(data_type='exomes', release_version='4.1', public=True, test=False, stratify=True, coverage_type='coverage')[source]

Fetch filepath for all sites coverage or allele number release Table.

Parameters:
  • data_type (str) – ‘exomes’ or ‘genomes’. Default is ‘exomes’.

  • release_version (str) – Release version.

  • public (bool) – Determines whether release coverage Table is read from public or private bucket. Default is public.

  • test (bool) – Whether to use a tmp path for testing. Default is False.

  • stratify (bool) – Whether to stratify results by platform and subset. Default is True.

  • coverage_type (str) – ‘coverage’ or ‘allele_number’. Default is ‘coverage’.

Return type:

str

Returns:

File path for desired coverage Hail Table.

gnomad_qc.v4.resources.release.release_coverage_tsv_path(data_type='exomes', release_version='4.0', test=False)[source]

Fetch path to coverage TSV file.

Parameters:
  • data_type (str) – ‘exomes’ or ‘genomes’. Default is ‘exomes’.

  • release_version (str) – Release version. Default is CURRENT_COVERAGE_RELEASE[“exomes”].

  • test (bool) – Whether to use a tmp path for testing. Default is False.

Return type:

str

Returns:

Coverage TSV path.

gnomad_qc.v4.resources.release.release_all_sites_an_tsv_path(data_type='exomes', release_version=None, test=False)[source]

Fetch path to all sites AN TSV file.

Parameters:
  • data_type (str) – ‘exomes’ or ‘genomes’. Default is ‘exomes’.

  • release_version (str) – Release version. Default is CURRENT_ALL_SITES_AN_RELEASE[data_type].

  • test (bool) – Whether to use a tmp path for testing. Default is False.

Return type:

str

Returns:

All sites AN TSV path.

gnomad_qc.v4.resources.release.release_coverage(data_type='exomes', public=False, test=False, stratify=True)[source]

Retrieve versioned resource for coverage release Table.

Parameters:
  • data_type (str) – ‘exomes’ or ‘genomes’. Default is ‘exomes’.

  • public (bool) – Determines whether release coverage Table is read from public or private bucket. Default is private.

  • test (bool) – Whether to use a tmp path for testing. Default is False.

  • stratify (bool) – Whether to stratify results by platform and subset. Default is True.

Return type:

VersionedTableResource

Returns:

Coverage release Table.

gnomad_qc.v4.resources.release.release_all_sites_an(data_type='exomes', public=False, test=False)[source]

Retrieve versioned resource for all sites allele number release Table.

Parameters:
  • data_type (str) – ‘exomes’ or ‘genomes’. Default is ‘exomes’.

  • public (bool) – Determines whether release allele number Table is read from public or private bucket. Default is private.

  • test (bool) – Whether to use a tmp path for testing. Default is False.

Return type:

VersionedTableResource

Returns:

All sites allele number release Table.

gnomad_qc.v4.resources.release.included_datasets_json_path(data_type='exomes', test=False, release_version='4.1')[source]

Fetch filepath for the JSON containing all datasets used in the release.

Parameters:
  • data_type (str) – ‘exomes’ or ‘genomes’. Default is ‘exomes’.

  • test (bool) – Whether to use a tmp path for testing. Default is False.

  • release_version (str) – Release version. Defaults to CURRENT RELEASE

Return type:

str

Returns:

File path for release versions included datasets JSON

gnomad_qc.v4.resources.release.validated_release_ht(test=False, data_type='exomes')[source]

Retrieve versioned resource for validated sites-only release Table.

Parameters:
  • test (bool) – Whether to use a tmp path for testing. Default is False.

  • data_type (str) – ‘exomes’ or ‘genomes’. Default is ‘exomes’.

Return type:

VersionedTableResource

Returns:

Validated release Table

gnomad_qc.v4.resources.release.get_freq_array_readme(data_type='exomes')[source]

Fetch README for freq array for the specified data_type.

Parameters:

data_type (str) – ‘exomes’ or ‘genomes’. Default is ‘exomes’.

Return type:

str

Returns:

README for freq array.

gnomad_qc.v4.resources.release.get_false_dup_genes_path(release_version='4.1', test=False)[source]

Retrieve path for the liftover table containing three genes of interest within a false duplication in GRCh38.

Parameters:
  • release_version (str) – Release version. Defaults to CURRENT RELEASE.

  • test (bool) – Whether to use a tmp path for testing. Default is False.

Return type:

str

Returns:

Combined custom liftover table path for the three genes in false duplication.