Skip to main content

Run Sample Outlier QC

Pipeline VersionDate UpdatedDocumentation AuthorQuestions or Feedback
aou_9.0.0September, 2025WARP PipelinesFile an issue

Introduction to the Run Sample Outlier QC workflow

run_sample_outlier_qc is a WDL workflow that joins ancestry predictions with cohort callset summary statistics and identifies outlier samples using ancestry-stratified quality-control residuals.

The workflow first merges ancestry outputs with requested QC metrics, then computes residual-based filters using gnomAD QC utilities. It outputs both full-sample QC annotations and a filtered table containing only flagged samples.

Quickstart table

Pipeline FeatureDescriptionSource
Analysis typeSample-level outlier QC with ancestry-aware stratification
Workflow languageWDL 1.0openWDL
Data input file formatTSV/CSV ancestry and callset summary inputs
Data output file formatTSV + tarred Hail tables (.ht.tar.gz)
Primary softwareHail + gnomAD QC utilitiesHail, gnomAD methods

Set-up

Run Sample Outlier QC installation and requirements

The workflow code can be downloaded by cloning the WARP GitHub repository. For the latest release, please see the run_sample_outlier_qc changelog.

The pipeline can be deployed using Cromwell, a GA4GH-compliant workflow management system.

Inputs

Input descriptions

Input variable nameDescriptionType
callset_summary_csvCohort callset summary statistics CSV (e.g., GVS summary metrics).File
ancestry_results_tsvAncestry prediction table output from run_ancestry.wdl.File
output_prefixPrefix applied to all output artifacts.String
metrics_to_check_in(Optional) Python-list string of metrics to evaluate. Defaults to a preset list of variant/QC metrics.String?

Run Sample Outlier QC tasks and tools

The workflow runs two tasks to create a full ancestry+QC table and then identify outliers.

  1. Join ancestry and summary metrics
  2. Compute stratified outlier filters

To see specific tool parameters, select the task WDL link in the table; then view the command {} section of the task in the WDL script.

Task name and WDL linkToolSoftwareDescription
join_ancestry_to_statsHail table joinshailgenetics/hail:0.2.67Joins ancestry results with callset summary metrics and writes full ancestry Hail table artifact.
determine_outlier_qcgnomAD residual filteringhailgenetics/hail:0.2.67Computes residuals/threshold filters and exports flagged and full-sample outputs.

1. Join ancestry and summary metrics

join_ancestry_to_stats imports ancestry and callset summary tables, joins selected metrics by sample ID, and writes <output_prefix>.full_ancestry.ht.tar.gz.

2. Compute stratified outlier filters

determine_outlier_qc computes ancestry-stratified residual metrics, applies QC thresholds, and exports full and flagged sample tables as TSV + Hail artifacts.

Outputs

Output variable nameFilename, if applicableOutput format and description
flagged_samples_tsv<output_prefix>.flagged_samples.tsvTSV of samples with one or more QC metric filter failures.
all_samples_tsv<output_prefix>.full.tsvTSV containing full sample set with QC residual/filter annotations.
ancestry_with_flagged_samples_tar_gz<output_prefix>.full.ht.tar.gzTarred Hail table containing full ancestry/QC annotations for all samples.

Versioning

All run_sample_outlier_qc releases are documented in the changelog.

Feedback

Please help us make our tools better by filing an issue in WARP; we welcome pipeline-related suggestions or questions.