1 Introduction

1.1 COURSE OVERVIEW

In recent years single cell RNA-seq (scRNA-seq) has become widely used for transcriptome analysis in many areas of biology. In contrast to bulk RNA-seq, scRNA-seq provides quantitative measurements of the expression of every gene in a single cell. However, to analyze scRNA-seq data, novel methods are required and some of the underlying assumptions for the methods developed for bulk RNA-seq experiments are no longer valid. In this course we will cover all steps of the scRNA-seq processing, starting from the raw reads coming off the sequencer. The course includes common analysis strategies, using state-of-the-art methods and we also discuss the central biological questions that can be addressed using scRNA-seq.

1.2 TARGETED AUDIENCE & ASSUMED BACKGROUND

This course is aimed at researchers and technical workers who are or will be analyzing scRNA-seq data. The material is suitable both for experimentalists who want to learn more about data-analysis as well as computational biologists who want to learn about scRNASeq methods. Examples demonstrated in this course can be applied to any experimental protocol or biological system.

The requirements for this course are: 1. Working knowledge of unix (managing files, running programs) 2. Programming experience in R (writing a function, basic I/O operations, variable types, using packages). Bioconductor experience is a plus. 3. Familiarity with next-generation sequencing data and its analyses (using alignment and quantification tools for bulk sequencing data)

1.3 COURSE FORMAT

The course will be delivered over the course of five days. Each day will include a lecture and laboratory component. The lecture will introduce the topics of discussion and the laboratory sessions will be focused on practical hands-on analysis of scRNA-seq data. These sessions will involve a combination of both mirroring exercises with the instructor to demonstrate a skill as well as applying these skills on your own to complete individual exercises. After and during each exercise, interpretation of results will be discussed as a group. Computing will be done using a combination of tools installed on the attendees laptop computer and web resources accessed via web browser.

1.4 Getting Started

1.5 SESSION CONTENT

1.5.1 Monday – Classes from 09:30 to 17:30 (lunch break-1 hr, 40 min of total coffee breaks)

1.5.1.1 Lecture 1 – scRNA-Seq experimental design

Overview of course
General introduction: HCA/KCO overview
Comparison of Bulk and single cell RNA-Seq
Overview of available scRNA-seq technologies (10x) and experimental protocols
scRNA-Seq experimental design and analysis workflow?

1.5.1.2 Lab 1 – Understanding sequencing raw data

Lab based around data wrangling from public data repositories: get data from 10x website, single cell portal, from GEO (fastqs, counts)
Shell and Unix commands to navigate directories, create folders, open files
Raw file formats

1.5.1.3 Lecture 2 - Intro to Data processing: from bcl file to bam file

scRNA-Seq processing workflow starting with choice of sequencer (NextSeq, HiSeq, MiSeq) / barcode swapping and bcl files
Overview of Popular tools and algorithms
Common single-cell analyses and interpretation
Sequencing data: alignment and quality control
Looking at cool things in alignment like where reads are, mutations, splicing

1.5.1.4 Lab 2 – Processing raw scRNA-Seq data

Data outputs from different scRNAseq technologies (10x, Smart-seq2) - process both?
Demultiplexing sequencing data
Read Quality Control (CellRanger, dropEst, fastqc)
Run bowtie2 on 2 wells to demonstrate alignment
Read alignment and visualization (kallisto, RSEM, Igviewer)
Demultiplexing
FastQC
Align (STAR/TOPHAT/Kallisto)
IGViewer - what do we want here? I use it for mutation detections, copying sequences, searching for alternative splicing.

1.5.1.5 Flash talks (1.5 hr, break into 2 groups of 13)

small presentation about your genome assembly and annotation project, ideally do 3 slides -2/3 mins (powerpoint or similar). So you can introduce yourselves and we can get to know each other.

1.5.2 Tuesday – Classes from 09:30 to 17:30

1.5.2.1 Lecture 3 – Transcriptome quantification: from bam file to counts

Read & UMI counting (Kallisto alignment-free pseudocounts as well), how RSEM works (length dependence, sequencing depth, multimapping reads), CellRanger (dropest), bustools
10x barcode structure and links to Perturb-seq
Gene length & coverage
Gene expression units (count data Smart-seq2 counts or 10x UMIs vs expression data)
Some R overview slides, https://r4ds.had.co.nz/

1.5.2.2 Lab 3 - Introduction to R

Installing packages
Data-types
Data manipulation, slicing
Strings manipulations
Introducing object oriented programming / S4 objects
Visualization tools
Bonus create FeaturePlot from Seurat in base ggplot
Bonus: run RSEM on Dana’s bam files if you are bored

1.5.2.3 Lecture 4 - Expression QC, normalisation and batch correction

What CellRanger does for quality filtering
PBMC data
Normalisation methods https://www.nature.com/articles/nmeth.4292
Doublets, empty droplets
Barcode swapping
Regression with technical covariates
What about imputation?

1.5.2.4 Lab 4 – Data wrangling for scRNAseq data

Data structures and file formats for single-cell data
Quality control of cells and genes (doublets, ambient, empty drops)
Data exploration: violin plots…
Introducing Seurat object
- Genes
- House keeping genes
- Mitochondrial genes (never used these ones)
- Filter - Do we remove both cells and genes here?
- Normalize (introduce more options, other than log transform?)
- Find variable genes (Is it a first reduction? Why the binning?)
- Scaling
- Regression
- Heatmap of desired genes?
- Sigantures?
- Bonus - imputation (magic? One of the two Gocken recommended?)

1.5.2.5 Flash talks (1.5 hr, break into 2 groups of 13)

small presentation about your genome assembly and annotation project, ideally do 3 slides -2/3 mins (powerpoint or similar). So you can introduce yourselves and we can get to know each other.

1.5.3 Wednesday – Classes from 09:30 to 17:30

1.5.3.1 Lecture 5 - Identifying cell populations

Feature selection
Dimensionality reduction
Clustering and assigning identity (Louvain, NMF, topic models, variational autoencoder)
Differential expression tests

1.5.3.2 Lab 5 – Feature selection & Clustering analysis

Parameters and clustering
Comparison of feature selection methods

1.5.3.3 Lecture 6 - Introduction to batch effects

Batch correction methods (regress out batch, scaling within batch, CCA, MNN, Liger, scvi, scgen)
Evaluation methods for batch correction (ARI, average silhouette width, kBET…)

1.5.3.4 Lab 6 - Correcting batch effects

Comparison of batch correction methods, Seurat pancreas

1.5.3.5 Poster session with beer & wine (time?)

Poster of current or proposed single cell genomic study. Print out would be good (recent posters are fine), or if you haven’t got a recent one, just a quick one pulled together on powerpoint and printed out on A4 is okay too - you shouldn’t stress too much about this, it’s just to connect the flash talks to the posters and it will be absolutely informal (unlike a poster session at a conference!). Take notes on how to do single-cell analysis, ideas, challenges, things you find interesting, directions you would like to explore.

1.5.4 Thursday – Classes from 09:30 to 17:30

Post poster session discussion groups 30 min discussion wrapping up poster session, and keeping track of ideas in a shared Google doc

1.5.4.1 Lecture 7 - Advanced topics: Pseudotime cell trajectories

Waddington Landscape
Pseudotime inference
Differential expression through pseudotime

1.5.4.2 Lab 8 - Functional and Pseudotime analysis

Popular tools and packages for functional analysis (https://github.com/dynverse/dynmethods#list-of-included-methods)
Review concepts from papers
Comparison of pseudotime methods

1.5.4.3 Lecture 8 - Single-cell multiomic technologies

Introduction to other omic data types
Integrating scRNA-seq with other single-cell modalities (CITE, Perturb, ATAC, methylation…)

1.5.4.4 Lab 9 - Analysis of CITE-seq, scATAC-seq

1.5.5 Friday – Classes from 09:30 to 17:30

1.5.5.1 Lab 10 - small dataset for analysis

Karthik has a good starting point: Krumlov_lab2_guidelines.html and Hemberg lab http://hemberg-lab.github.io/scRNA.seq.course/advanced-exercises.html Present a set of methods and orders in which to try them - IDH mutated glioma - Goal for first half: get to clustering and identify malignant cells (inferCNV possible) - Goal for second half: cell states - For our other courses, the last day we usually divide them in groups of 3-4 ppl and assign them a small dataset in order to repeat all the analyses they have learnt during the week, to think about the best strategy to carry out the analyses and to present the results in front of the other ppl. I believe this is something very productive and helpful because it is a kind of wrap-up and brainstorming session. Normally, ppl love to talk and present something and in this way they/we can be 100% sure they really understood the different steps of the pipeline.

1.5.5.2 Lecture 9 - Group presentations

Review, Questions and Answers

ANALYSIS OF SINGLE CELL RNA-SEQ DATA