HDL PRS example

Here we show an example of our pipeline for HDL PRS on UK Biobank samples. We use both effects estimates from MVP lipid traits analysis as well as posterior effects generated by mashr package.

Data used

Reference panel

Obtained via download_1000G() in bigsnpr.

Including 503 (mostly unrelated) European individuals and ~1.7M SNPs in common with either HapMap3 or the UK Biobank. Classification of European population can be found at IGSR. European individuals ID are from IGSR data portal.

GWAS summary statistics data

From MVP. We have the original GWAS summary data as well as multivariate posterior estimate of HDL effects using mashr. In brief, we have two versions of summary statistics (effect estimates) for HDL.

Target test data: UK biobank

We select randomly from UK Biobank 2000 individuals with covariates and HDL phenotype (medication adjusted, inverse normalized). Their genotypes are extracted. See UKB.QC.* PLINK file bundle.

PRS Models

Auto model runs the algorithm for 30 different $p$ (the proportion of causal variants) values range from 10e-4 to 0.9, and heritability $h^2$ from LD score regression as initial value.

Grid model tries a grid of parameters $p$, ranges from 0 to 1 and three $h^2$ which are 0.7/1/1.4 times of initial $h^2$ estimated by LD score regression.

Analysis of MVP GWAS data

Step 1: QC on reference panel

Here we assume the target data QC has been already performed. We perform here QC for reference panel,

Step 2: Intersect SNPs among summary stats, reference panel and target data

Step 3: Harmonize alleles for shared SNPs

To handle major/minor allele, strand flips and consequently possible flips in sign for summary statistics.

Step 4: Calculate LD matrix and fit LDSC model

Step 6: Estimate posterior effect sizes and PRS

For original data,

Step 7: predict phenotypes

Baseline model: Traits ~ Sex + Age

Inf/grid/auto model: Traits ~ Sex + Age + PRS