ANALYSIS OF SINGLE CELL RNA-SEQ DATA
1
Introduction
1.1
COURSE OVERVIEW
1.2
TARGETED AUDIENCE & ASSUMED BACKGROUND
1.3
COURSE FORMAT
1.4
Getting Started
1.5
SESSION CONTENT
1.5.1
Monday – Classes from 09:30 to 17:30 (lunch break-1 hr, 40 min of total coffee breaks)
1.5.2
Tuesday – Classes from 09:30 to 17:30
1.5.3
Wednesday – Classes from 09:30 to 17:30
1.5.4
Thursday – Classes from 09:30 to 17:30
1.5.5
Friday – Classes from 09:30 to 17:30
2
scRNA-Seq Experimental Design
3
Understanding Sequencing Raw Data
3.1
Class Environment
3.1.1
Getting into AWS Instance
3.2
Shell and Unix commands
3.2.1
Common Linux Commands
3.3
File formats
3.3.1
View FASTQ Files
3.3.2
View BAM Files
3.4
Public data repositories
3.4.1
Cellranger/10x
3.4.2
GEO
3.4.3
Single Cell Portal
4
Data Preprocessing
5
Processing scRNAseq Data
5.1
Goal
5.2
Further reading
5.3
FastQC
5.3.1
Fastq file format
5.4
Align the reads
5.4.1
STAR align
5.4.2
Bam file format
5.5
Visualization
6
Transcriptome Quantification
7
Introduction R/Bioconductor
7.1
Start Environment
7.2
Installing packages
7.2.1
CRAN
7.2.2
Github
7.2.3
Bioconductor
7.2.4
Source
7.3
Installation instructions:
7.3.1
Classes/Types
7.3.2
Data structures
7.3.3
Detour to S3/S4
7.4
More information
7.4.1
Checking for help for any function!
7.5
Grammer of Graphics (ggplot2)
7.5.1
What is ggplot2?
7.5.2
Principles of ggplot2
7.6
Reference
8
Expression QC and Normalization
9
Data Wrangling scRNAseq
9.1
Goal
9.2
Introduction
9.2.1
Load necessary packages
9.2.2
Read in NSCLC counts matrix.
9.2.3
Let’s examine the sparse counts matrix
9.2.4
How big is the matrix?
9.2.5
How much memory does a sparse matrix take up relative to a dense matrix?
9.3
Filtering low-quality cells
9.3.1
Look at the summary counts for genes and cells
9.3.2
Plot cells ranked by their number of detected genes.
9.4
Beginning with Seurat:
http://satijalab.org/seurat/
9.4.1
Creating a seurat object
9.5
Preprocessing step 1 : Filter out low-quality cells
9.6
Examine contents of Seurat object
9.6.1
Preprocessing step 2 : Expression normalization
9.7
Detection of variable genes across the single cells
9.8
Gene set expression across cells
10
Identifying Cell Populations
10.1
Google Slides
11
Feature Selection and Cluster Analysis
11.1
Abstract
11.2
Seurat Tutorial
11.2.1
Preprocessing Steps
11.2.2
Start of Identifying Cell Types
11.3
Feature Selection
11.3.1
Differential Expression Analysis
11.3.2
Dimensionality Reduction
11.3.3
Independent Components Analysis (ICA)
11.3.4
Clustering
11.3.5
Check Clusters
11.3.6
Practice Visualizing/Embedding
12
Batch Effects
13
Correcting Batch Effects
13.1
Load settings and packages
13.2
Preparing the individual Seurat objects for each pancreas dataset without batch correction
13.3
Cluster pancreatic datasets without batch correction
13.3.1
Batch correction: canonical correlation analysis (CCA) using Seurat
13.3.2
Batch correction: integrative non-negative matrix factorization (NMF) using LIGER
13.4
Additional exploration: Regressing out unwanted covariates
13.5
Additional exploration: kBET
13.6
Additional exploration: Seurat 3
13.7
Acknowledgements
14
Functional Analysis
14.1
Google Slides
14.2
Gene sets and signatures
14.2.1
Cell Cycle
14.3
Pathway analysis
14.4
inferCNV / honeybadger
14.4.1
Create the InferCNV Object
14.4.2
Filtering genes
14.4.3
Normalize each cell’s counts for sequencing depth
14.4.4
Perform Anscombe normalization
14.4.5
Log transform the normalized counts:
14.4.6
Apply maximum bounds to the expression data to reduce outlier effects
14.4.7
Initial view, before inferCNV operations:
14.4.8
Perform smoothing across chromosomes
14.4.9
Subtract the reference values from observations, now have log(fold change) values
14.4.10
Invert log values
14.4.11
Removing noise
14.4.12
Remove outlier data points
14.4.13
Find DE genes by comparing the mutant types to normal types, BASIC
14.4.14
Additional Information
15
Pseudotime Cell Trajectories
15.1
Google Slides
15.2
Comparison Abstract
16
Functional Pseudotime Analysis
16.1
Load settings and packages
16.2
First look at the differentiation data from Deng et al.
16.3
Diffusion map pseudotime
16.4
Slingshot map pseudotime
16.5
Find temporally expressed genes
16.6
Comparison of the different trajectory inference methods
16.7
Plots of gene expression over time.
16.8
Acknowledgements
17
Single Cell Multiomic Technologies
18
CITE-seq and scATAC-seq
18.1
Load settings and packages
18.2
Load in the data
18.3
Setup a Seurat object, and cluster cells based on RNA expression
18.4
Add the protein expression levels to the Seurat object
18.5
Visualize protein levels on RNA clusters
18.6
Identify differentially expressed proteins between clusters
18.7
Cluster directly on protein levels
18.8
Additional exploration: another example of multi-modal analysis
18.9
Acknowledgements
19
Single Cell Resources
19.1
Comprehensive list of single-cell resources
19.2
Computational packages for single-cell analysis
19.3
eLife Commentary on the Human Cell Atlas
19.4
Online courses
References
Published with bookdown
ANALYSIS OF SINGLE CELL RNA-SEQ DATA
6
Transcriptome Quantification