Skip to main content

Admixture Rye Preprocessing

Pipeline VersionDate UpdatedDocumentation AuthorQuestions or Feedback
aou_9.0.0September, 2025WARP PipelinesFile an issue

Introduction to the run_preprocess_admixture_est_rye workflow

run_preprocess_admixture_est_rye is a WDL workflow that generates the input files required by the Rye admixture estimation tool. It is the first step in the Admixture Rye analysis path used in All of Us processing.

The workflow consumes PCA outputs produced upstream by the ancestry pipeline — specifically eigenvalues, training PCA projections, and ancestry prediction data — and reshapes them into the file formats expected by Rye. It uses Hail and pandas to transform and merge training and testing PCA projections and to export a population-to-group mapping file.

This workflow must be run before run_admixture_est_rye.

Quickstart table

Workflow FeatureDescriptionSource
Analysis typeAdmixture preprocessing (PCA → Rye inputs)
Workflow languageWDL 1.0openWDL
Input data typePCA eigenvalues, training PCA table, ancestry prediction TSV
Output file formatTSV (eigenvalues, eigenvectors, pop2group)
Primary toolHail, pandasHail, pandas
Part of analysis pathAdmixture Rye (Step 1 of 2)

Set-up

run_preprocess_admixture_est_rye installation and requirements

The workflow code can be downloaded by cloning the WARP GitHub repository. For the latest release, please see the run_preprocess_admixture_est_rye changelog.

The pipeline can be deployed using Cromwell, a GA4GH-compliant workflow management system.

Inputs

Input descriptions

Input variable nameDescriptionType
eigenvalues_urlPath to the PCA eigenvalues file (Hail Table format). Computed using Hail HWE-normalized PCA.String
training_pca_urlPath to the training PCA projections TSV. Computed from reference panel data (e.g., HGDP + 1KG) using Hail PCA.String
ancestry_data_urlPath to the All of Us ancestry prediction file containing projected PCA features for test samples.String
prefixOutput file prefix (e.g., aou_delta). Prepended to all output filenames.String
cpusNumber of CPU threads to allocate to the VM. Default: 16.Int
docker_imageDocker image with Hail installed. Default: hailgenetics/hail:0.2.67.String

run_preprocess_admixture_est_rye tasks and tools

This workflow calls a single task to transform upstream PCA outputs into Rye-compatible input files.

  1. Preprocess PCA data for Rye

To see specific tool parameters, select the task WDL link in the table; then view the command {} section of the task in the WDL script.

Task name and WDL linkToolSoftwareDescription
run_preprocessHail, pandasPythonMerges training and testing PCA projections, exports Rye-format eigenvalues, eigenvectors, and population-to-group mapping files.

1. Preprocess PCA data for Rye

The run_preprocess task runs a Python script inside the Hail Docker container. It reads eigenvalues, training PCA, and ancestry prediction data; transforms both training and testing PCA projections into a standardized eigenvector format; concatenates them; and writes three output files consumed by run_admixture_est_rye. The population-to-group mapping is derived from distinct population labels in the training data, excluding the oth (other) category.

Outputs

Output variable nameFilenameOutput format and description
rye_eigenval<prefix>_rye.eigenvaluesTab-separated eigenvalues file in Rye format.
rye_eigenvec<prefix>_rye.eigenvecTab-separated eigenvectors file containing merged training and testing PCA projections in Rye format.
rye_pop2group<prefix>_rye.pop2groupTab-separated population-to-group mapping file derived from reference panel labels.

Versioning

All run_preprocess_admixture_est_rye releases are documented in the changelog.

Feedback

Please help us make our tools better by filing an issue in WARP; we welcome pipeline-related suggestions or questions.