Skip to main content

Admixture Estimation Workflows (WDL)

This section describes the All of Us admixture workflows used to estimate global ancestry proportions from the ancestry pipeline outputs. There are two supported analysis paths, each consisting of two workflows run in sequence.

Background

This pipeline uses the GnomAD 3.1.2 reference panel (1KG + HGDP), which provides broad global coverage but has known limitations including uneven population representation and limited resolution for some ancestries. Outputs are best suited for population-level summaries rather than precise individual-level ancestry inference.


Admixture Rye Analysis

The Admixture Rye path uses the Rye tool to estimate ancestry proportions from PCA data. Run these workflows in order:

StepWorkflowDescriptionWDL
1Admixture Rye PreprocessingGenerates Rye-compatible eigenvalues, eigenvectors, and population-to-group mapping files from ancestry pipeline PCA outputs.WDL
2Admixture RyeRuns Rye to estimate ancestry proportions; outputs .Q and .fam files.WDL

Admixture Unsupervised Analysis

The unsupervised path uses ADMIXTURE directly on genotype data, allowing for customization of the reference panel to better account for underrepresented populations. Run these workflows in order:

StepWorkflowDescriptionWDL
1Admixture Unsupervised PreprocessingConverts merged VCF inputs from the ancestry pipeline into PLINK binary format for downstream ADMIXTURE clustering.WDL
2Admixture UnsupervisedRuns ADMIXTURE in unsupervised mode; outputs ancestry proportion (.Q) and allele frequency (.P) matrices.WDL

Feedback

Please help us make our tools better by filing an issue in WARP; we welcome pipeline-related suggestions or questions.