Admixture Rye Preprocessing

Pipeline Version	Date Updated	Documentation Author	Questions or Feedback
aou_9.0.0	September, 2025	WARP Pipelines	File an issue

Introduction to the run_preprocess_admixture_est_rye workflow

run_preprocess_admixture_est_rye is a WDL workflow that generates the input files required by the Rye admixture estimation tool. It is the first step in the Admixture Rye analysis path used in All of Us processing.

The workflow consumes PCA outputs produced upstream by the ancestry pipeline — specifically eigenvalues, training PCA projections, and ancestry prediction data — and reshapes them into the file formats expected by Rye. It uses Hail and pandas to transform and merge training and testing PCA projections and to export a population-to-group mapping file.

This workflow must be run before run_admixture_est_rye.

Quickstart table

Workflow Feature	Description	Source
Analysis type	Admixture preprocessing (PCA → Rye inputs)
Workflow language	WDL 1.0	openWDL
Input data type	PCA eigenvalues, training PCA table, ancestry prediction TSV
Output file format	TSV (eigenvalues, eigenvectors, pop2group)
Primary tool	Hail, pandas	Hail, pandas
Part of analysis path	Admixture Rye (Step 1 of 2)

Set-up

run_preprocess_admixture_est_rye installation and requirements

The workflow code can be downloaded by cloning the WARP GitHub repository. For the latest release, please see the run_preprocess_admixture_est_rye changelog.

The pipeline can be deployed using Cromwell, a GA4GH-compliant workflow management system.

Inputs

Input descriptions

Input variable name	Description	Type
`eigenvalues_url`	Path to the PCA eigenvalues file (Hail Table format). Computed using Hail HWE-normalized PCA.	String
`training_pca_url`	Path to the training PCA projections TSV. Computed from reference panel data (e.g., HGDP + 1KG) using Hail PCA.	String
`ancestry_data_url`	Path to the All of Us ancestry prediction file containing projected PCA features for test samples.	String
`prefix`	Output file prefix (e.g., `aou_delta`). Prepended to all output filenames.	String
`cpus`	Number of CPU threads to allocate to the VM. Default: `16`.	Int
`docker_image`	Docker image with Hail installed. Default: `hailgenetics/hail:0.2.67`.	String

run_preprocess_admixture_est_rye tasks and tools

This workflow calls a single task to transform upstream PCA outputs into Rye-compatible input files.

Preprocess PCA data for Rye

To see specific tool parameters, select the task WDL link in the table; then view the command {} section of the task in the WDL script.

Task name and WDL link	Tool	Software	Description
run_preprocess	Hail, pandas	Python	Merges training and testing PCA projections, exports Rye-format eigenvalues, eigenvectors, and population-to-group mapping files.

1. Preprocess PCA data for Rye

The run_preprocess task runs a Python script inside the Hail Docker container. It reads eigenvalues, training PCA, and ancestry prediction data; transforms both training and testing PCA projections into a standardized eigenvector format; concatenates them; and writes three output files consumed by run_admixture_est_rye. The population-to-group mapping is derived from distinct population labels in the training data, excluding the oth (other) category.

Outputs

Output variable name	Filename	Output format and description
`rye_eigenval`	`<prefix>_rye.eigenvalues`	Tab-separated eigenvalues file in Rye format.
`rye_eigenvec`	`<prefix>_rye.eigenvec`	Tab-separated eigenvectors file containing merged training and testing PCA projections in Rye format.
`rye_pop2group`	`<prefix>_rye.pop2group`	Tab-separated population-to-group mapping file derived from reference panel labels.

Versioning

All run_preprocess_admixture_est_rye releases are documented in the changelog.

Feedback

Please help us make our tools better by filing an issue in WARP; we welcome pipeline-related suggestions or questions.

Introduction to the run_preprocess_admixture_est_rye workflow​

Quickstart table​

Set-up​

run_preprocess_admixture_est_rye installation and requirements​

Inputs​

Input descriptions​

run_preprocess_admixture_est_rye tasks and tools​

1. Preprocess PCA data for Rye​

Outputs​

Versioning​

Feedback​