gnomad_qc.v4.resources.basics
Script containing generic resources.
Module Functions
Get gnomAD v4 data with desired filtering and metadata annotations. |
|
|
Get gnomAD v4 genomes VariantDataset with desired filtering and metadata annotations. |
Return path to temporary QC bucket. |
|
Create a checkpoint path for Table or MatrixTable. |
|
Create a path for Hail log files. |
|
|
Add metadata to MT in 'meta_name' column. |
Return path to capture intervals Table. |
Script containing generic resources.
- gnomad_qc.v4.resources.basics.get_gnomad_v4_vds(split=False, remove_hard_filtered_samples=True, remove_hard_filtered_samples_no_sex=False, high_quality_only=False, keep_controls=False, release_only=False, controls_only=False, test=False, n_partitions=None, filter_partitions=None, chrom=None, autosomes_only=False, sex_chr_only=False, filter_variant_ht=None, filter_intervals=None, split_reference_blocks=True, remove_dead_alleles=True, annotate_meta=False, entries_to_keep=None, annotate_het_non_ref=False)[source]
Get gnomAD v4 data with desired filtering and metadata annotations.
- Parameters:
split (
bool
) – Perform split on VDS - Note: this will perform a split on the VDS rather than grab an already split VDS.remove_hard_filtered_samples (
bool
) – Whether to remove samples that failed hard filters (only relevant after hard filtering is complete).remove_hard_filtered_samples_no_sex (
bool
) – Whether to remove samples that failed non sex inference hard filters (only relevant after pre-sex imputation hard filtering is complete).high_quality_only (
bool
) – Whether to filter the VDS to only high quality samples (only relevant after outlier filtering is complete).keep_controls (
bool
) – Whether to keep control samples when filtering the VDS to a subset of samples.release_only (
bool
) – Whether to filter the VDS to only samples available for release (can only be used if metadata is present).controls_only (
bool
) – Whether to filter the VDS to only control samples.test (
bool
) – Whether to use the test VDS instead of the full v4 VDS.n_partitions (
Optional
[int
]) – Optional argument to read the VDS with a specific number of partitions.filter_partitions (
Optional
[List
[int
]]) – Optional argument to filter the VDS to specific partitions.chrom (
Union
[str
,List
[str
],Set
[str
],None
]) – Optional argument to filter the VDS to a specific chromosome(s).autosomes_only (
bool
) – Whether to filter the VDS to autosomes only. Default is False.sex_chr_only (
bool
) – Whether to filter the VDS to sex chromosomes only. Default is False.filter_variant_ht (
Optional
[Table
]) – Optional argument to filter the VDS to a specific set of variants. Only supported when splitting the VDS.filter_intervals (
Optional
[List
[Union
[str
,tinterval
]]]) – Optional argument to filter the VDS to specific intervals.split_reference_blocks (
bool
) – Whether to split the reference data at the edges of the intervals defined by filter_intervals. Default is True.remove_dead_alleles (
bool
) – Whether to remove dead alleles from the VDS when removing withdrawn UKB samples. Default is True.annotate_meta (
bool
) – Whether to annotate the VDS with the sample QC metadata. Default is False.entries_to_keep (
Optional
[List
[str
]]) – Optional argument to keep only specific entries in the returned VDS. If splitting the VDS, use the global entries (e.g. ‘GT’) instead of the local entries (e.g. ‘LGT’) to keep.annotate_het_non_ref (
bool
) – Whether to annotate non reference heterozygotes (as ‘_het_non_ref’) to the variant data. Default is False.
- Return type:
- Returns:
gnomAD v4 dataset with chosen annotations and filters.
- gnomad_qc.v4.resources.basics.get_gnomad_v4_genomes_vds(split=False, remove_hard_filtered_samples=True, release_only=False, annotate_meta=False, test=False, filter_partitions=None, chrom=None, autosomes_only=False, sex_chr_only=False, filter_variant_ht=None, filter_intervals=None, split_reference_blocks=True, entries_to_keep=None, annotate_het_non_ref=False)[source]
Get gnomAD v4 genomes VariantDataset with desired filtering and metadata annotations.
- Parameters:
split (
bool
) – Perform split on VDS - Note: this will perform a split on the VDS rather than grab an already split VDS.remove_hard_filtered_samples (
bool
) – Whether to remove samples that failed hard filters (only relevant after sample QC).release_only (
bool
) – Whether to filter the VDS to only samples available for release (can only be used if metadata is present).annotate_meta (
bool
) – Whether to add v4 genomes metadata to VDS variant_data in ‘meta’ column.test (
bool
) – Whether to use the test VDS instead of the full v4 genomes VDS.filter_partitions (
Optional
[List
[int
]]) – Optional argument to filter the VDS to specific partitions in the provided list.chrom (
Union
[str
,List
[str
],Set
[str
],None
]) – Optional argument to filter the VDS to a specific chromosome(s).autosomes_only (
bool
) – Whether to filter the VDS to autosomes only. Default is False.sex_chr_only (
bool
) – Whether to filter the VDS to sex chromosomes only. Default is False.filter_variant_ht (
Optional
[Table
]) – Optional argument to filter the VDS to a specific set of variants. Only supported when splitting the VDS.filter_intervals (
Optional
[List
[Union
[str
,tinterval
]]]) – Optional argument to filter the VDS to specific intervals.split_reference_blocks (
bool
) – Whether to split the reference data at the edges of the intervals defined by filter_intervals. Default is True.entries_to_keep (
Optional
[List
[str
]]) – Optional argument to keep only specific entries in the returned VDS. If splitting the VDS, use the global entries (e.g. ‘GT’) instead of the local entries (e.g. ‘LGT’) to keep.annotate_het_non_ref (
bool
) – Whether to annotate non reference heterozygotes (as ‘_het_non_ref’) to the variant data. Default is False.
- Return type:
- Returns:
gnomAD v4 genomes VariantDataset with chosen annotations and filters.
- gnomad_qc.v4.resources.basics.qc_temp_prefix(version='4.1', data_type='exomes')[source]
Return path to temporary QC bucket.
- Parameters:
version (
str
) – Version of annotation path to return.data_type – One of ‘exomes’ or ‘genomes’. Default is ‘exomes’.
- Return type:
str
- Returns:
Path to bucket with temporary QC data
- gnomad_qc.v4.resources.basics.get_checkpoint_path(name, version='4.1', mt=False)[source]
Create a checkpoint path for Table or MatrixTable.
- Parameters:
name (
str
) – Name of intermediate Table/MatrixTableversion (
str
) – Version of annotation path to returnmt (
bool
) – Whether path is for a MatrixTable, default is Falsename –
mt –
- Return type:
str
- Returns:
Output checkpoint path
- gnomad_qc.v4.resources.basics.get_logging_path(name, version='4.1')[source]
Create a path for Hail log files.
- Parameters:
name (
str
) – Name of log fileversion (
str
) – Version of annotation path to return
- Return type:
str
- Returns:
Output log path
- gnomad_qc.v4.resources.basics.add_meta(mt, version='4.0', meta_name='meta')[source]
Add metadata to MT in ‘meta_name’ column.
- Parameters:
mt (
MatrixTable
) – MatrixTable to which ‘meta_name’ annotation should be addedversion (
str
) –meta_name (
str
) –
- Return type:
- Returns:
MatrixTable with metadata added in a ‘meta’ column
- gnomad_qc.v4.resources.basics.calling_intervals(interval_name, calling_interval_padding)[source]
Return path to capture intervals Table.
- Parameters:
interval_name (
str
) – One of ‘ukb’, ‘broad’, ‘intersection’ or ‘union’.calling_interval_padding (
int
) – Padding around calling intervals. Available options are 0 or 50.
- Return type:
TableResource
- Returns:
Calling intervals resource.