Skip to main content

Admixture Rye

Pipeline VersionDate UpdatedDocumentation AuthorQuestions or Feedback
aou_9.0.0September, 2025WARP PipelinesFile an issue

Introduction to the run_admixture_est_rye workflow

run_admixture_est_rye is a WDL workflow that runs the Rye admixture estimation tool to estimate global ancestry proportions from PCA-derived data. It is the second step in the Admixture Rye analysis path used in All of Us processing.

The workflow accepts Rye-format eigenvalues, eigenvectors, and a population-to-group mapping file produced by run_preprocess_admixture_est_rye and runs the Rye R tool to produce ancestry proportion estimates. Outputs are analogous in format to the .Q and .fam files produced by the classic ADMIXTURE tool, enabling direct comparison between methods.

Rye uses an iterative optimization approach parameterized by the number of principal components, optimization rounds, and iterations, balancing accuracy against runtime.

Quickstart table

Workflow FeatureDescriptionSource
Analysis typeRye-based admixture estimation
Workflow languageWDL 1.0openWDL
Input data typeRye-format eigenvalues, eigenvectors, pop2group mapping
Output file format.Q (ancestry proportions), .fam (sample metadata)
Primary toolRyeRye GitHub
Part of analysis pathAdmixture Rye (Step 2 of 2)

Set-up

run_admixture_est_rye installation and requirements

The workflow code can be downloaded by cloning the WARP GitHub repository. For the latest release, please see the run_admixture_est_rye changelog.

The pipeline can be deployed using Cromwell, a GA4GH-compliant workflow management system.

Inputs

Input descriptions

Input variable nameDescriptionType
eigenvalues_fileRye-compatible eigenvalues file produced by run_preprocess_admixture_est_rye.File
eigenvec_fileRye-compatible eigenvectors file containing merged training and testing PCA projections.File
pop2group_filePopulation-to-group mapping file used by Rye to define reference ancestry groups.File
prefixOutput file prefix (e.g., aou_delta). Prepended to all output filenames.String
pcsNumber of principal components to use in estimation. Default: 20.Int
roundsNumber of optimization rounds. Higher values increase accuracy but increase runtime. Default: 200.Int
iterNumber of iterations per optimization round. Default: 100.Int
cpusNumber of CPU threads. Default: 16.Int
docker_imageDocker image with Rye installed. Default: us-central1-docker.pkg.dev/broad-dsde-methods/aou-auxiliary/rye-admixture-estimation-tool:v1.0.String

run_admixture_est_rye tasks and tools

This workflow calls a single task to run Rye ancestry estimation.

  1. Run Rye admixture estimation

To see specific tool parameters, select the task WDL link in the table; then view the command {} section of the task in the WDL script.

Task name and WDL linkToolSoftwareDescription
run_ryeRyeRRuns the Rye R script to estimate ancestry proportions from PCA data. Outputs .Q and .fam files with admixture proportions and sample metadata.

1. Run Rye admixture estimation

The run_rye task invokes the Rye R script (/rye/rye.R) with the eigenvalues, eigenvectors, and population mapping inputs. Rye performs iterative optimization to estimate the proportion of ancestry each sample derives from each reference population group. After the Rye run completes, the output .Q file is renamed to follow the pattern <prefix>-<pcs>.Q for consistent delocalization. See the Rye documentation for details on file formats and examples.

Outputs

Output variable nameFilenameOutput format and description
qFile<prefix>-<pcs>.QTab-delimited admixture proportion matrix. Each row is a sample; each column is an ancestry component. Analogous to ADMIXTURE .Q output.
famFile<prefix>-<pcs>.famSample metadata file in PLINK .fam format.

Versioning

All run_admixture_est_rye releases are documented in the changelog.

Feedback

Please help us make our tools better by filing an issue in WARP; we welcome pipeline-related suggestions or questions.