gnomad_qc.v4.resources.basics

Script containing generic resources.

Module Functions

gnomad_qc.v4.resources.basics.get_gnomad_v4_vds([...])

Get gnomAD v4 data with desired filtering and metadata annotations.

gnomad_qc.v4.resources.basics.get_gnomad_v4_genomes_vds([...])

Get gnomAD v4 genomes VariantDataset with desired filtering and metadata annotations.

gnomad_qc.v4.resources.basics.qc_temp_prefix([...])

Return path to temporary QC bucket.

gnomad_qc.v4.resources.basics.get_checkpoint_path(name)

Create a checkpoint path for Table or MatrixTable.

gnomad_qc.v4.resources.basics.get_logging_path(name)

Create a path for Hail log files.

gnomad_qc.v4.resources.basics.add_meta(mt[, ...])

Add metadata to MT in 'meta_name' column.

gnomad_qc.v4.resources.basics.calling_intervals(...)

Return path to capture intervals Table.

Script containing generic resources.

gnomad_qc.v4.resources.basics.get_gnomad_v4_vds(split=False, remove_hard_filtered_samples=True, remove_hard_filtered_samples_no_sex=False, high_quality_only=False, keep_controls=False, release_only=False, controls_only=False, test=False, n_partitions=None, filter_partitions=None, chrom=None, autosomes_only=False, sex_chr_only=False, filter_variant_ht=None, filter_intervals=None, split_reference_blocks=True, remove_dead_alleles=True, annotate_meta=False, entries_to_keep=None, annotate_het_non_ref=False)[source]

Get gnomAD v4 data with desired filtering and metadata annotations.

Parameters:
  • split (bool) – Perform split on VDS - Note: this will perform a split on the VDS rather than grab an already split VDS.

  • remove_hard_filtered_samples (bool) – Whether to remove samples that failed hard filters (only relevant after hard filtering is complete).

  • remove_hard_filtered_samples_no_sex (bool) – Whether to remove samples that failed non sex inference hard filters (only relevant after pre-sex imputation hard filtering is complete).

  • high_quality_only (bool) – Whether to filter the VDS to only high quality samples (only relevant after outlier filtering is complete).

  • keep_controls (bool) – Whether to keep control samples when filtering the VDS to a subset of samples.

  • release_only (bool) – Whether to filter the VDS to only samples available for release (can only be used if metadata is present).

  • controls_only (bool) – Whether to filter the VDS to only control samples.

  • test (bool) – Whether to use the test VDS instead of the full v4 VDS.

  • n_partitions (Optional[int]) – Optional argument to read the VDS with a specific number of partitions.

  • filter_partitions (Optional[List[int]]) – Optional argument to filter the VDS to specific partitions.

  • chrom (Union[str, List[str], Set[str], None]) – Optional argument to filter the VDS to a specific chromosome(s).

  • autosomes_only (bool) – Whether to filter the VDS to autosomes only. Default is False.

  • sex_chr_only (bool) – Whether to filter the VDS to sex chromosomes only. Default is False.

  • filter_variant_ht (Optional[Table]) – Optional argument to filter the VDS to a specific set of variants. Only supported when splitting the VDS.

  • filter_intervals (Optional[List[Union[str, tinterval]]]) – Optional argument to filter the VDS to specific intervals.

  • split_reference_blocks (bool) – Whether to split the reference data at the edges of the intervals defined by filter_intervals. Default is True.

  • remove_dead_alleles (bool) – Whether to remove dead alleles from the VDS when removing withdrawn UKB samples. Default is True.

  • annotate_meta (bool) – Whether to annotate the VDS with the sample QC metadata. Default is False.

  • entries_to_keep (Optional[List[str]]) – Optional argument to keep only specific entries in the returned VDS. If splitting the VDS, use the global entries (e.g. ‘GT’) instead of the local entries (e.g. ‘LGT’) to keep.

  • annotate_het_non_ref (bool) – Whether to annotate non reference heterozygotes (as ‘_het_non_ref’) to the variant data. Default is False.

Return type:

VariantDataset

Returns:

gnomAD v4 dataset with chosen annotations and filters.

gnomad_qc.v4.resources.basics.get_gnomad_v4_genomes_vds(split=False, remove_hard_filtered_samples=True, release_only=False, annotate_meta=False, test=False, filter_partitions=None, chrom=None, autosomes_only=False, sex_chr_only=False, filter_variant_ht=None, filter_intervals=None, split_reference_blocks=True, entries_to_keep=None, annotate_het_non_ref=False)[source]

Get gnomAD v4 genomes VariantDataset with desired filtering and metadata annotations.

Parameters:
  • split (bool) – Perform split on VDS - Note: this will perform a split on the VDS rather than grab an already split VDS.

  • remove_hard_filtered_samples (bool) – Whether to remove samples that failed hard filters (only relevant after sample QC).

  • release_only (bool) – Whether to filter the VDS to only samples available for release (can only be used if metadata is present).

  • annotate_meta (bool) – Whether to add v4 genomes metadata to VDS variant_data in ‘meta’ column.

  • test (bool) – Whether to use the test VDS instead of the full v4 genomes VDS.

  • filter_partitions (Optional[List[int]]) – Optional argument to filter the VDS to specific partitions in the provided list.

  • chrom (Union[str, List[str], Set[str], None]) – Optional argument to filter the VDS to a specific chromosome(s).

  • autosomes_only (bool) – Whether to filter the VDS to autosomes only. Default is False.

  • sex_chr_only (bool) – Whether to filter the VDS to sex chromosomes only. Default is False.

  • filter_variant_ht (Optional[Table]) – Optional argument to filter the VDS to a specific set of variants. Only supported when splitting the VDS.

  • filter_intervals (Optional[List[Union[str, tinterval]]]) – Optional argument to filter the VDS to specific intervals.

  • split_reference_blocks (bool) – Whether to split the reference data at the edges of the intervals defined by filter_intervals. Default is True.

  • entries_to_keep (Optional[List[str]]) – Optional argument to keep only specific entries in the returned VDS. If splitting the VDS, use the global entries (e.g. ‘GT’) instead of the local entries (e.g. ‘LGT’) to keep.

  • annotate_het_non_ref (bool) – Whether to annotate non reference heterozygotes (as ‘_het_non_ref’) to the variant data. Default is False.

Return type:

VariantDataset

Returns:

gnomAD v4 genomes VariantDataset with chosen annotations and filters.

gnomad_qc.v4.resources.basics.qc_temp_prefix(version='4.1', data_type='exomes')[source]

Return path to temporary QC bucket.

Parameters:
  • version (str) – Version of annotation path to return.

  • data_type – One of ‘exomes’ or ‘genomes’. Default is ‘exomes’.

Return type:

str

Returns:

Path to bucket with temporary QC data

gnomad_qc.v4.resources.basics.get_checkpoint_path(name, version='4.1', mt=False)[source]

Create a checkpoint path for Table or MatrixTable.

Parameters:
  • name (str) – Name of intermediate Table/MatrixTable

  • version (str) – Version of annotation path to return

  • mt (bool) – Whether path is for a MatrixTable, default is False

  • name

  • mt

Return type:

str

Returns:

Output checkpoint path

gnomad_qc.v4.resources.basics.get_logging_path(name, version='4.1')[source]

Create a path for Hail log files.

Parameters:
  • name (str) – Name of log file

  • version (str) – Version of annotation path to return

Return type:

str

Returns:

Output log path

gnomad_qc.v4.resources.basics.add_meta(mt, version='4.0', meta_name='meta')[source]

Add metadata to MT in ‘meta_name’ column.

Parameters:
  • mt (MatrixTable) – MatrixTable to which ‘meta_name’ annotation should be added

  • version (str) –

  • meta_name (str) –

Return type:

MatrixTable

Returns:

MatrixTable with metadata added in a ‘meta’ column

gnomad_qc.v4.resources.basics.calling_intervals(interval_name, calling_interval_padding)[source]

Return path to capture intervals Table.

Parameters:
  • interval_name (str) – One of ‘ukb’, ‘broad’, ‘intersection’ or ‘union’.

  • calling_interval_padding (int) – Padding around calling intervals. Available options are 0 or 50.

Return type:

TableResource

Returns:

Calling intervals resource.