Admixture Unsupervised

Pipeline Version	Date Updated	Documentation Author	Questions or Feedback
aou_9.0.2	January, 2026	WARP Pipelines	File an issue

Introduction to the run_admixture workflow

run_admixture is a WDL workflow that runs ADMIXTURE in unsupervised mode to estimate global ancestry proportions from PLINK binary genotype data. It is the second step in the Admixture Unsupervised analysis path used in All of Us processing.

The workflow accepts PLINK binary files (.bed/.bim/.fam) produced by convert_vcf_to_plink_bed and runs ADMIXTURE to identify K ancestry components without reference population supervision. This approach allows for customization of the reference panel to better account for underrepresented populations compared to supervised methods.

ADMIXTURE outputs an ancestry proportion matrix (.Q) and a population allele frequency matrix (.P), which can be used for downstream population-level inference or construction of a pruned reference panel. For the ADMIXTURE manual and full parameter documentation, see the ADMIXTURE documentation.

Quickstart table

Workflow Feature	Description	Source
Analysis type	Unsupervised admixture estimation
Workflow language	WDL 1.0	openWDL
Input data type	PLINK binary (`.bed`, `.bim`, `.fam`)
Output file format	`.Q` (ancestry proportions), `.P` (allele frequencies)
Primary tool	ADMIXTURE	ADMIXTURE
Docker image	`mussmann/admixpipe:3.0`
Part of analysis path	Admixture Unsupervised (Step 2 of 2)

Set-up

run_admixture installation and requirements

The workflow code can be downloaded by cloning the WARP GitHub repository. For the latest release, please see the run_admixture changelog.

The pipeline can be deployed using Cromwell, a GA4GH-compliant workflow management system.

Inputs

Input descriptions

Input variable name	Description	Type
`bed`	PLINK binary genotype file produced by `convert_vcf_to_plink_bed`.	File
`bim`	PLINK variant information file.	File
`fam`	PLINK sample metadata file.	File

The following values are currently set at the task level in the WDL and are not exposed as workflow inputs: K_in (default 6), num_cpus_in (default 4), and mem_gb (default 120).

run_admixture tasks and tools

This workflow calls a single task to perform unsupervised admixture clustering.

Run ADMIXTURE unsupervised clustering

To see specific tool parameters, select the task WDL link in the table; then view the command {} section of the task in the WDL script.

Task name and WDL link	Tool	Software	Description
run_admixture	ADMIXTURE	`/app/bin/admixture` (admixpipe:3.0)	Runs ADMIXTURE in unsupervised mode on PLINK binary files. Estimates K ancestry components using maximum likelihood. Outputs ancestry proportion and allele frequency matrices.

1. Run ADMIXTURE unsupervised clustering

The task invokes ADMIXTURE on the input .bed file with the specified number of ancestry components (K) and CPU threads. ADMIXTURE uses maximum likelihood to estimate the proportion of each sample's ancestry attributable to each of the K inferred populations. Output files follow ADMIXTURE's standard naming convention based on the input file basename:

<basename>.<K>.Q — ancestry proportion matrix (one row per sample, one column per ancestry component)
<basename>.<K>.P — population allele frequency matrix (one row per variant, one column per ancestry component)

Outputs

Output variable name	Filename	Output format and description
`admixture_Q`	`<basename>.<K>.Q`	Ancestry proportion matrix. Each row is a sample; each column is an inferred ancestry component. Values sum to 1 per row.
`admixture_P`	`<basename>.<K>.P`	Population allele frequency matrix. Each row is a variant; each column is the allele frequency in the corresponding inferred ancestry component.

Versioning

All run_admixture releases are documented in the changelog.

Feedback

Please help us make our tools better by filing an issue in WARP; we welcome pipeline-related suggestions or questions.

Introduction to the run_admixture workflow​

Quickstart table​

Set-up​

run_admixture installation and requirements​

Inputs​

Input descriptions​

run_admixture tasks and tools​

1. Run ADMIXTURE unsupervised clustering​

Outputs​

Versioning​

Feedback​