gnomad_qc.v4.variant_qc.import_variant_qc_vcf
Script to load variant QC result VCF into a Hail Table.
usage: gnomad_qc.v4.variant_qc.import_variant_qc_vcf.py [-h] [-o]
[--slack-channel SLACK_CHANNEL]
--vcf-path VCF_PATH
--model-id MODEL_ID
--compute-info-method
{AS,quasi,set_long_AS_missing_info}
--transmitted-singletons
TRANSMITTED_SINGLETONS
--sibling-singletons
SIBLING_SINGLETONS
--adj ADJ
--interval-qc-filter
INTERVAL_QC_FILTER
--calling-interval-filter
CALLING_INTERVAL_FILTER
[--n-partitions N_PARTITIONS]
[--header-path HEADER_PATH]
[--array-elements-required]
[--is-split]
[--deduplication-check]
[--snp-features SNP_FEATURES [SNP_FEATURES ...]]
[--indel-features INDEL_FEATURES [INDEL_FEATURES ...]]
Named Arguments
- -o, --overwrite
Whether to overwrite data already present in the output Table.
Default: False
- --slack-channel
Slack channel to post results and notifications to.
- --vcf-path
Path to variant QC result VCF. Can be specified as Hadoop glob patterns.
- --model-id
Model ID for the variant QC result HT.
- --compute-info-method
Possible choices: AS, quasi, set_long_AS_missing_info
Compute info method used to generate the variant QC results. Options are ‘AS’, ‘quasi’ or ‘set_long_AS_missing_info’.
- --transmitted-singletons
Whether transmitted singletons were used in training the model.
- --sibling-singletons
Whether sibling singletons were used in training the model.
- --adj
Whether adj filtered singletons were used in training the model.
- --interval-qc-filter
Whether only variants in intervals passing interval QC were used in training the model.
- --calling-interval-filter
Whether only variants in the intersection of Broad/DSP calling intervals with 50 bp of padding were used for training.
- --n-partitions
Number of desired partitions for output Table.
Default: 5000
- --header-path
Optional path to a header file to use for importing the variant QC result VCF.
- --array-elements-required
Pass if you would like array elements required in import_vcf to be true.
Default: False
- --is-split
Whether the VCF is already split.
Default: False
- --deduplication-check
Remove duplicate variants. Useful for v4 MVP when reading from potentially overlapping shards.
Default: False
- --snp-features
Features used in the SNP VQSR model.
Default: [‘AS_QD’, ‘AS_MQRankSum’, ‘AS_ReadPosRankSum’, ‘AS_FS’, ‘AS_MQ’]
- --indel-features
Features used in the indel VQSR model.
Default: [‘AS_QD’, ‘AS_MQRankSum’, ‘AS_ReadPosRankSum’, ‘AS_FS’]
Module Functions
|
Import variant QC result site VCF into a HT. |
Load variant QC result VCF into a Hail Table. |
|
|
Get script argument parser. |
Script to load variant QC result VCF into a Hail Table.
- gnomad_qc.v4.variant_qc.import_variant_qc_vcf.import_variant_qc_vcf(vcf_path, model_id, num_partitions=5000, import_header_path=None, array_elements_required=False, is_split=False, deduplicate_check=False)[source]
Import variant QC result site VCF into a HT.
- Parameters:
vcf_path (
str
) – Path to input variant QC result site vcf. This can be specified as Hadoop glob patterns.model_id (
str
) – Model ID for the variant QC results. Must start with ‘rf_’, ‘vqsr_’, or ‘if_’.num_partitions (
int
) – Number of partitions to use for the output HT.import_header_path (
Optional
[str
]) – Optional path to a header file to use for import.array_elements_required (
bool
) – Value of array_elements_required to pass to hl.import_vcf.is_split (
bool
) – Whether the VCF is already split.deduplicate_check (
bool
) – Check for and remove duplicate variants.
- Return type:
- Returns:
HT containing variant QC results.