gnomad_qc.v4.subset
Script to filter the gnomAD v4 VariantDataset to a subset of specified samples.
This script subsets gnomAD using a list of samples or terra workspaces.
usage: gnomad_qc.v4.subset.py [-h] [--test]
(--subset-samples SUBSET_SAMPLES | --subset-workspaces SUBSET_WORKSPACES)
[--include-ukb-200k] [--vds] [--vcf]
[--dense-mt] [--split-multi]
[--n-partitions N_PARTITIONS]
[--subset-call-stats] [--add-variant-qc]
[--pass-only]
[--variant-qc-annotations VARIANT_QC_ANNOTATIONS [VARIANT_QC_ANNOTATIONS ...]]
[--export-meta] [--keep-data-paths]
--output-path OUTPUT_PATH [-o]
Named Arguments
- --test
Filter to the first 2 partitions for testing.
Default: False
- --subset-samples
Path to a text file with sample IDs for subsetting and a header: s.
- --subset-workspaces
Path to a text file with Terra workspaces that should be included in the subset, must use a header of ‘terra_workspace’.
- --include-ukb-200k
Whether to include the 200K UK Biobank samples.
Default: False
- --vds
Whether to make a subset VDS.
Default: False
- --vcf
Whether to make a subset VCF.
Default: False
- --dense-mt
Whether to make a dense MT
Default: False
- --split-multi
Whether to split multi-allelic variants.
Default: False
- --n-partitions
Number of desired partitions for the subset VDS if –vds and/or MT if –dense-mt is set and/or the number of shards in the output VCF if –vcf is set. By default, there will be no change in partitioning.
- --subset-call-stats
Adds subset callstats, AC, AN, AF, nhomalt.
Default: False
- --add-variant-qc
Annotate exported file with gnomAD’s variant QC annotations. Defaults to all annotations if a subset of annotations are not specified using the –variant-qc-annotations arg
Default: False
- --pass-only
Keep only the variants that passed variant QC, i.e. the filter field is PASS.
Default: False
- --variant-qc-annotations
Variant QC annotations to add to the output file. Defaults to all annotations.
- --export-meta
Pull sample subset metadata and export to a HT and .tsv.
Default: False
- --keep-data-paths
Keep CRAM and gVCF paths in the project metadata export.
Default: False
- --output-path
Output file path for subsetted VDS/VCF/MT, do not include file extension.
- -o, --overwrite
Overwrite all data from this subset (default: False).
Default: False
Module Functions
Make a dictionary of gnomAD release annotation expressions to annotate onto the subsetted data. |
|
|
Filter the gnomAD v4 VariantDataset to a subset of specified samples. |
|
Get script argument parser. |
Script to filter the gnomAD v4 VariantDataset to a subset of specified samples.
- gnomad_qc.v4.subset.make_variant_qc_annotations_dict(key_expr, vqc_annotations=None)[source]
Make a dictionary of gnomAD release annotation expressions to annotate onto the subsetted data.
- Parameters:
key_expr (
StructExpression
) – Key to join annotations on.vqc_annotations (
Optional
[List
[str
]]) – Optional list of desired annotations from the release HT.
- Return type:
Dict
[str
,Expression
]- Returns:
Dictionary containing Hail expressions to annotate onto subset.