All of Us RNA-seq eQTL and sQTL Analysis Pipeline
This section documents the All of Us RNA-seq QTL workflows for expression/splicing phenotype generation, QTL analysis preparation, and SuSiE fine-mapping aggregation.
Quick Summary
- Purpose: Run reproducible eQTL and sQTL analyses from genotype and RNA-derived inputs.
- Primary Outputs: RNA-derived phenotype artifacts, TensorQTL-ready inputs, SuSiE fine-mapping results, and aggregated annotations.
Input Requirements
To run the workflows described below, you generally need:
- A joint-called VCF containing the relevant samples
- Research IDs partitioned by ancestry or subpopulation
- RNA expression quantifications (for eQTL)
- BAM/CRAM files for splice junction extraction (for sQTL)
- Sample-level metadata tables
Ordered Analysis Flow
The table below reflects the original end-to-end run order, including steps that do not yet have dedicated docs pages in this folder.
| Order | Stage | Workflow / Component | Documentation | WDL | Run next |
|---|---|---|---|---|---|
| 0 | Cohort setup | Ancestry grouping and sample lists | No dedicated WDL page (notebook/table prep) | N/A | Use ancestry/sample partitions as inputs to genotype and phenotype prep. |
| 1 | Genotype prep | Prepare genotypes (pruning + PLINK + PCs) | No dedicated page yet | PrepareGenotypes.wdl | Run dosage generation per ancestry/population. |
| 2 | Genotype prep | Calculate genotype dosage | No dedicated page yet | calculateGenotypeDosage.wdl | Feed dosages into TensorQTL and SuSiE inputs later. |
| 3 | RNA processing | RNA-seq AoU processing (alignment/quant/QC) | RNA-seq AoU Processing | rnaseq_aou.wdl | Branch into eQTL phenotype prep and/or sQTL junction extraction. |
| 4 | RNA processing | Aggregate cohort-level RNA outputs (RSEM in WARP; RNA-SeQC2 external for now) | No dedicated page yet | aggregate_rsem_results.wdl | Use aggregated expression/QC summaries for downstream phenotype prep and cohort QC review. |
| 5 | eQTL phenotypes | Prepare eQTL phenotype BED + phenotype PCs | No dedicated page yet | prepare_eQTL.wdl | Merge covariates for eQTL TensorQTL run. |
| 6 | sQTL phenotypes | Extract junctions from BAM | Leafcutter BAM to Junctions | leafcutter_bam_to_junc.wdl | Cluster junctions to build sQTL phenotype matrices. |
| 7 | sQTL phenotypes | Cluster junctions + generate leafcutter outputs | Leafcutter Clustering | leafcutter_cluster.wdl | If needed, run separate phenotype-group generation before covariate merge. |
| 8 | sQTL phenotypes | Prepare sQTL phenotype BED + PCs | No dedicated page yet | prepare_sQTL.wdl | Use splicing BED/PC outputs for sQTL covariate merge and TensorQTL. |
| 9 | sQTL metadata | Calculate phenotype groups | Calculate Phenotype Groups | CalculatePhenotypeGroups.wdl | Merge covariates for sQTL TensorQTL run. |
| 10 | Covariates | Merge covariates (genotype PCs + phenotype PCs ± groups) | No dedicated page yet | MergeCovariates.wdl | Run TensorQTL cis permutations for eQTL/sQTL. |
| 11 | Association | TensorQTL cis permutations | No dedicated page yet | tensorqtl_cis_permutations.wdl | Recalculate FDR and prepare significant loci for fine-mapping. |
| 12 | Fine-mapping prep | FDR recalculation + SuSiE input preparation, including required AF calculation and genotype dosage checks for downstream aggregation/annotation | No dedicated page yet | calculateAF.wdl | Run SuSiE per phenotype window. |
| 13 | Fine-mapping | SuSiE fine-mapping | SuSiE Fine-Mapping Workflow | susieR_workflow.wdl | Aggregate SuSiE outputs across phenotypes. |
| 14 | Aggregation | Aggregate SuSiE outputs and annotate | Aggregate SuSiE Workflow | AggregateSusieWorkflow.wdl | Consume required AF outputs from step 12 for interpretation/reporting. |
Practical Run Notes
- eQTL path: 0 → 1 → 2 → 3 → 4 → 5 → 10 → 11 → 12 → 13 → 14.
- sQTL path: 0 → 1 → 2 → 3 → 4 → 6 → 7 → 8 → 9 (if required) → 10 → 11 → 12 → 13 → 14.
- Key dependency:
susieR_workflowexpects TensorQTL-derived significant loci plus dosage inputs from earlier genotype steps. - Phenotype groups: required for many sQTL TensorQTL configurations; for eQTL they are typically not required.
Additional Processing Notes
- FDR, AF, and SuSiE prep: after TensorQTL, recalculate FDR, filter significant loci (commonly 0.05), calculate AFs, and format SuSiE-ready inputs.
- SuSiE runtime guidance: preemptible VMs can reduce cost; pinned Docker SHAs improve reproducibility.
- Aggregation inputs: for
AggregateSusieWorkflow, use fine-mappedSusieParquetoutputs and required AF resources from step 12; do not use full/all-tested parquet outputs. - RNA-level aggregation status:
aggregate_rsem_results.wdlis available in WARP for cohort-level RSEM aggregation; an equivalent aggregaternaseqc2workflow is not yet available in WARP and currently requires external processing.
Acknowledgements
The original versions of these workflows were either created by the GTEx Consortium (see their GTEx GitHub repository) or by the Dr. Stephen Montgomery Lab at Stanford University. Most of the eQTL scripts originated from the AoU-Multiomics-Analysis repository.
This pipeline builds upon extensive work by the Stephen Montgomery Lab at Stanford University. Special thanks to:
- Evin Padhi
- Jon Nguyen
for developing foundational versions of many scripts and workflows used in this analysis.
Additional integration, optimization, and workflow migration were performed by the All of Us Multiomics and the Broad Pipeline Development teams as part of the WARP workflow suite.
Feedback
Please help us make our tools better by filing an issue in WARP; we welcome pipeline-related suggestions or questions.