Admixture Estimation Workflows (WDL)

This section describes the All of Us admixture workflows used to estimate global ancestry proportions from the ancestry pipeline outputs. There are two supported analysis paths, each consisting of two workflows run in sequence.

Background

This pipeline uses the GnomAD 3.1.2 reference panel (1KG + HGDP), which provides broad global coverage but has known limitations including uneven population representation and limited resolution for some ancestries. Outputs are best suited for population-level summaries rather than precise individual-level ancestry inference.

Admixture Rye Analysis

The Admixture Rye path uses the Rye tool to estimate ancestry proportions from PCA data. Run these workflows in order:

Step	Workflow	Description	WDL
1	Admixture Rye Preprocessing	Generates Rye-compatible eigenvalues, eigenvectors, and population-to-group mapping files from ancestry pipeline PCA outputs.	WDL
2	Admixture Rye	Runs Rye to estimate ancestry proportions; outputs `.Q` and `.fam` files.	WDL

Admixture Unsupervised Analysis

The unsupervised path uses ADMIXTURE directly on genotype data, allowing for customization of the reference panel to better account for underrepresented populations. Run these workflows in order:

Step	Workflow	Description	WDL
1	Admixture Unsupervised Preprocessing	Converts merged VCF inputs from the ancestry pipeline into PLINK binary format for downstream ADMIXTURE clustering.	WDL
2	Admixture Unsupervised	Runs ADMIXTURE in unsupervised mode; outputs ancestry proportion (`.Q`) and allele frequency (`.P`) matrices.	WDL

Feedback

Please help us make our tools better by filing an issue in WARP; we welcome pipeline-related suggestions or questions.

Background​

Admixture Rye Analysis​

Admixture Unsupervised Analysis​

Feedback​

Background

Admixture Rye Analysis

Admixture Unsupervised Analysis

Feedback