Skip to main content

All of Us RNA-seq eQTL and sQTL Analysis Pipeline

This section documents the All of Us RNA-seq QTL workflows for expression/splicing phenotype generation, QTL analysis preparation, and SuSiE fine-mapping aggregation.

Quick Summary

  • Purpose: Run reproducible eQTL and sQTL analyses from genotype and RNA-derived inputs.
  • Primary Outputs: RNA-derived phenotype artifacts, TensorQTL-ready inputs, SuSiE fine-mapping results, and aggregated annotations.

Input Requirements

To run the workflows described below, you generally need:

  • A joint-called VCF containing the relevant samples
  • Research IDs partitioned by ancestry or subpopulation
  • RNA expression quantifications (for eQTL)
  • BAM/CRAM files for splice junction extraction (for sQTL)
  • Sample-level metadata tables

Ordered Analysis Flow

The table below reflects the original end-to-end run order, including steps that do not yet have dedicated docs pages in this folder.

OrderStageWorkflow / ComponentDocumentationWDLRun next
0Cohort setupAncestry grouping and sample listsNo dedicated WDL page (notebook/table prep)N/AUse ancestry/sample partitions as inputs to genotype and phenotype prep.
1Genotype prepPrepare genotypes (pruning + PLINK + PCs)No dedicated page yetPrepareGenotypes.wdlRun dosage generation per ancestry/population.
2Genotype prepCalculate genotype dosageNo dedicated page yetcalculateGenotypeDosage.wdlFeed dosages into TensorQTL and SuSiE inputs later.
3RNA processingRNA-seq AoU processing (alignment/quant/QC)RNA-seq AoU Processingrnaseq_aou.wdlBranch into eQTL phenotype prep and/or sQTL junction extraction.
4RNA processingAggregate cohort-level RNA outputs (RSEM in WARP; RNA-SeQC2 external for now)No dedicated page yetaggregate_rsem_results.wdlUse aggregated expression/QC summaries for downstream phenotype prep and cohort QC review.
5eQTL phenotypesPrepare eQTL phenotype BED + phenotype PCsNo dedicated page yetprepare_eQTL.wdlMerge covariates for eQTL TensorQTL run.
6sQTL phenotypesExtract junctions from BAMLeafcutter BAM to Junctionsleafcutter_bam_to_junc.wdlCluster junctions to build sQTL phenotype matrices.
7sQTL phenotypesCluster junctions + generate leafcutter outputsLeafcutter Clusteringleafcutter_cluster.wdlIf needed, run separate phenotype-group generation before covariate merge.
8sQTL phenotypesPrepare sQTL phenotype BED + PCsNo dedicated page yetprepare_sQTL.wdlUse splicing BED/PC outputs for sQTL covariate merge and TensorQTL.
9sQTL metadataCalculate phenotype groupsCalculate Phenotype GroupsCalculatePhenotypeGroups.wdlMerge covariates for sQTL TensorQTL run.
10CovariatesMerge covariates (genotype PCs + phenotype PCs ± groups)No dedicated page yetMergeCovariates.wdlRun TensorQTL cis permutations for eQTL/sQTL.
11AssociationTensorQTL cis permutationsNo dedicated page yettensorqtl_cis_permutations.wdlRecalculate FDR and prepare significant loci for fine-mapping.
12Fine-mapping prepFDR recalculation + SuSiE input preparation, including required AF calculation and genotype dosage checks for downstream aggregation/annotationNo dedicated page yetcalculateAF.wdlRun SuSiE per phenotype window.
13Fine-mappingSuSiE fine-mappingSuSiE Fine-Mapping WorkflowsusieR_workflow.wdlAggregate SuSiE outputs across phenotypes.
14AggregationAggregate SuSiE outputs and annotateAggregate SuSiE WorkflowAggregateSusieWorkflow.wdlConsume required AF outputs from step 12 for interpretation/reporting.

Practical Run Notes

  • eQTL path: 0 → 1 → 2 → 3 → 4 → 5 → 10 → 11 → 12 → 13 → 14.
  • sQTL path: 0 → 1 → 2 → 3 → 4 → 6 → 7 → 8 → 9 (if required) → 10 → 11 → 12 → 13 → 14.
  • Key dependency: susieR_workflow expects TensorQTL-derived significant loci plus dosage inputs from earlier genotype steps.
  • Phenotype groups: required for many sQTL TensorQTL configurations; for eQTL they are typically not required.

Additional Processing Notes

  • FDR, AF, and SuSiE prep: after TensorQTL, recalculate FDR, filter significant loci (commonly 0.05), calculate AFs, and format SuSiE-ready inputs.
  • SuSiE runtime guidance: preemptible VMs can reduce cost; pinned Docker SHAs improve reproducibility.
  • Aggregation inputs: for AggregateSusieWorkflow, use fine-mapped SusieParquet outputs and required AF resources from step 12; do not use full/all-tested parquet outputs.
  • RNA-level aggregation status: aggregate_rsem_results.wdl is available in WARP for cohort-level RSEM aggregation; an equivalent aggregate rnaseqc2 workflow is not yet available in WARP and currently requires external processing.

Acknowledgements

The original versions of these workflows were either created by the GTEx Consortium (see their GTEx GitHub repository) or by the Dr. Stephen Montgomery Lab at Stanford University. Most of the eQTL scripts originated from the AoU-Multiomics-Analysis repository.

This pipeline builds upon extensive work by the Stephen Montgomery Lab at Stanford University. Special thanks to:

  • Evin Padhi
  • Jon Nguyen

for developing foundational versions of many scripts and workflows used in this analysis.

Additional integration, optimization, and workflow migration were performed by the All of Us Multiomics and the Broad Pipeline Development teams as part of the WARP workflow suite.

Feedback

Please help us make our tools better by filing an issue in WARP; we welcome pipeline-related suggestions or questions.