Skip to main content

Admixture Unsupervised

Pipeline VersionDate UpdatedDocumentation AuthorQuestions or Feedback
aou_9.0.2January, 2026WARP PipelinesFile an issue

Introduction to the run_admixture workflow

run_admixture is a WDL workflow that runs ADMIXTURE in unsupervised mode to estimate global ancestry proportions from PLINK binary genotype data. It is the second step in the Admixture Unsupervised analysis path used in All of Us processing.

The workflow accepts PLINK binary files (.bed/.bim/.fam) produced by convert_vcf_to_plink_bed and runs ADMIXTURE to identify K ancestry components without reference population supervision. This approach allows for customization of the reference panel to better account for underrepresented populations compared to supervised methods.

ADMIXTURE outputs an ancestry proportion matrix (.Q) and a population allele frequency matrix (.P), which can be used for downstream population-level inference or construction of a pruned reference panel. For the ADMIXTURE manual and full parameter documentation, see the ADMIXTURE documentation.

Quickstart table

Workflow FeatureDescriptionSource
Analysis typeUnsupervised admixture estimation
Workflow languageWDL 1.0openWDL
Input data typePLINK binary (.bed, .bim, .fam)
Output file format.Q (ancestry proportions), .P (allele frequencies)
Primary toolADMIXTUREADMIXTURE
Docker imagemussmann/admixpipe:3.0
Part of analysis pathAdmixture Unsupervised (Step 2 of 2)

Set-up

run_admixture installation and requirements

The workflow code can be downloaded by cloning the WARP GitHub repository. For the latest release, please see the run_admixture changelog.

The pipeline can be deployed using Cromwell, a GA4GH-compliant workflow management system.

Inputs

Input descriptions

Input variable nameDescriptionType
bedPLINK binary genotype file produced by convert_vcf_to_plink_bed.File
bimPLINK variant information file.File
famPLINK sample metadata file.File

The following values are currently set at the task level in the WDL and are not exposed as workflow inputs: K_in (default 6), num_cpus_in (default 4), and mem_gb (default 120).

run_admixture tasks and tools

This workflow calls a single task to perform unsupervised admixture clustering.

  1. Run ADMIXTURE unsupervised clustering

To see specific tool parameters, select the task WDL link in the table; then view the command {} section of the task in the WDL script.

Task name and WDL linkToolSoftwareDescription
run_admixtureADMIXTURE/app/bin/admixture (admixpipe:3.0)Runs ADMIXTURE in unsupervised mode on PLINK binary files. Estimates K ancestry components using maximum likelihood. Outputs ancestry proportion and allele frequency matrices.

1. Run ADMIXTURE unsupervised clustering

The task invokes ADMIXTURE on the input .bed file with the specified number of ancestry components (K) and CPU threads. ADMIXTURE uses maximum likelihood to estimate the proportion of each sample's ancestry attributable to each of the K inferred populations. Output files follow ADMIXTURE's standard naming convention based on the input file basename:

  • <basename>.<K>.Q — ancestry proportion matrix (one row per sample, one column per ancestry component)
  • <basename>.<K>.P — population allele frequency matrix (one row per variant, one column per ancestry component)

Outputs

Output variable nameFilenameOutput format and description
admixture_Q<basename>.<K>.QAncestry proportion matrix. Each row is a sample; each column is an inferred ancestry component. Values sum to 1 per row.
admixture_P<basename>.<K>.PPopulation allele frequency matrix. Each row is a variant; each column is the allele frequency in the corresponding inferred ancestry component.

Versioning

All run_admixture releases are documented in the changelog.

Feedback

Please help us make our tools better by filing an issue in WARP; we welcome pipeline-related suggestions or questions.