Skip to main content

Admixture Unsupervised Preprocessing

Pipeline VersionDate UpdatedDocumentation AuthorQuestions or Feedback
aou_9.0.1November, 2025WARP PipelinesFile an issue

convert_vcf_to_plink_bed is a WDL workflow that converts a merged VCF file into PLINK binary format (.bed/.bim/.fam). It is the first step in the Admixture Unsupervised analysis path used in All of Us processing.

The workflow accepts merged VCF shards produced by the ancestry pipeline and runs PLINK to generate the binary genotype files required by the downstream run_admixture workflow. PLINK is run with double-ID assignment and permissive chromosome handling to accommodate multi-ancestry cohort data.

This workflow must be run before run_admixture.

Quickstart table

Workflow FeatureDescriptionSource
Analysis typeVCF-to-PLINK format conversion
Workflow languageWDL 1.0openWDL
Input data typeMerged VCF + index
Output file formatPLINK binary (.bed, .bim, .fam)
Primary toolPLINKPLINK 1.9
Docker imagemussmann/admixpipe:3.0
Part of analysis pathAdmixture Unsupervised (Step 1 of 2)

Set-up

The workflow code can be downloaded by cloning the WARP GitHub repository. For the latest release, please see the convert_vcf_to_plink_bed changelog.

The pipeline can be deployed using Cromwell, a GA4GH-compliant workflow management system.

Inputs

Input descriptions

Input variable nameDescriptionType
prefixOutput file prefix. Used as the base name for all PLINK output files.String
merged_vcf_shardsMerged VCF file from the ancestry pipeline.File
merged_vcf_shards_idxIndex file for the merged VCF.File

This workflow calls a single task to convert VCF input to PLINK binary format.

  1. Convert VCF to PLINK binary format

To see specific tool parameters, select the task WDL link in the table; then view the command {} section of the task in the WDL script.

Task name and WDL linkToolSoftwareDescription
convert_vcf_to_plink_bedPLINK/app/bin/plink (admixpipe:3.0)Converts a merged VCF to PLINK binary format using --make-bed. Applies --double-id and --allow-extra-chr for compatibility with AoU cohort data.

The task invokes PLINK 1.9 with the following key flags:

PLINK flagValueNotes
--vcf<merged_vcf_shards>Input VCF file
--make-bedOutputs binary PLINK format
--double-idSets both family ID and individual ID to the sample ID
--allow-extra-chrPermits non-standard chromosome names
--out<prefix>Sets output file base name

Outputs

Output variable nameFilenameOutput format and description
bed<prefix>.bedPLINK binary genotype file.
bim<prefix>.bimVariant information file (chromosome, variant ID, position, alleles).
fam<prefix>.famSample metadata file (family ID, individual ID, parental IDs, sex, phenotype).

Versioning

All convert_vcf_to_plink_bed releases are documented in the changelog.

Feedback

Please help us make our tools better by filing an issue in WARP; we welcome pipeline-related suggestions or questions.