1 Introduction

1.1 COURSE OVERVIEW

In recent years single cell RNA-seq (scRNA-seq) has become widely used for transcriptome analysis in many areas of biology. In contrast to bulk RNA-seq, scRNA-seq provides quantitative measurements of the expression of every gene in a single cell. However, to analyze scRNA-seq data, novel methods are required and some of the underlying assumptions for the methods developed for bulk RNA-seq experiments are no longer valid. In this course we will cover all steps of the scRNA-seq processing, starting from the raw reads coming off the sequencer. The course includes common analysis strategies, using state-of-the-art methods and we also discuss the central biological questions that can be addressed using scRNA-seq.

1.2 TARGETED AUDIENCE & ASSUMED BACKGROUND

This course is aimed at researchers and technical workers who are or will be analyzing scRNA-seq data. The material is suitable both for experimentalists who want to learn more about data-analysis as well as computational biologists who want to learn about scRNASeq methods. Examples demonstrated in this course can be applied to any experimental protocol or biological system.

The requirements for this course are: 1. Working knowledge of unix (managing files, running programs) 2. Programming experience in R (writing a function, basic I/O operations, variable types, using packages). Bioconductor experience is a plus. 3. Familiarity with next-generation sequencing data and its analyses (using alignment and quantification tools for bulk sequencing data)

1.3 COURSE FORMAT

The course will be delivered over the course of five days. Each day will include a lecture and laboratory component. The lecture will introduce the topics of discussion and the laboratory sessions will be focused on practical hands-on analysis of scRNA-seq data. These sessions will involve a combination of both mirroring exercises with the instructor to demonstrate a skill as well as applying these skills on your own to complete individual exercises. After and during each exercise, interpretation of results will be discussed as a group. Computing will be done using a combination of tools installed on the attendees laptop computer and web resources accessed via web browser.

1.4 Getting Started

1.5 SESSION CONTENT

1.5.1 Monday – Classes from 08:00 to 16:00 (lunch break-1 hr, 40 min of total coffee breaks)

Shared Google doc - course notes, ideas/questions/challenges/interesting topics you would like to explore.

1.5.1.1 Lecture 1 – scRNA-Seq experimental design (Orr)

Overview of course
General introduction: cell atlas overviews
Comparison of Bulk and single cell RNA-Seq
Overview of available scRNA-seq technologies (10x) and experimental protocols
scRNA-Seq experimental design and analysis workflow?

1.5.1.2 Lab 1 – Understanding sequencing raw data, downloading Docker if not done already (Kirk)

1.5.1.3 Lab based around data wrangling from public data repositories: get data from 10x website, single cell portal, from GEO (fastqs, counts)

Shell and Unix commands to navigate directories, create folders, open files
Raw file formats

1.5.1.4 Lecture 2 - Intro to Data processing: from bcl file to bam file, Transcriptome quantification: from bam file to counts (Dana)

scRNA-Seq processing workflow starting with choice of sequencer (NextSeq, HiSeq, MiSeq) / barcode swapping and bcl files
Overview of Popular tools and algorithms
Common single-cell analyses and interpretation
Sequencing data: alignment and quality control
Looking at cool things in alignment like where reads are, mutations, splicing
Read & UMI counting (Kallisto alignment-free pseudocounts as well), how RSEM works (length dependence, sequencing depth, multimapping reads), CellRanger (dropest), bustools
10x barcode structure and links to Perturb-seq
Gene length & coverage
Gene expression units (count data Smart-seq2 counts or 10x UMIs vs expression data)

1.5.1.5 Lab 2 – Processing raw scRNA-Seq data (Dana), Docker setup (Kirk)

Data outputs from different scRNAseq technologies (10x, Smart-seq2) - process both?
Demultiplexing sequencing data
Read Quality Control (CellRanger, dropEst, fastqc)
Run bowtie2 on 2 wells to demonstrate alignment
Read alignment and visualization (kallisto, RSEM, Igviewer)
Demultiplexing
FastQC
Align (STAR/TOPHAT/Kallisto)
IGViewer - what do we want here? I use it for mutation detections, copying sequences, searching for alternative splicing.

1.5.1.6 Flash talks (1.5 hr, break into 2 groups of 13) placed into a dropbox

1 slide advertising or summarizing the poster. So you can introduce yourselves and we can get to know each other. No questions, 2 minutes. Two sessions, 15 people each.

1.5.2 Tuesday – Classes from 08:00 to 16:00

1.5.2.1 Lab 3 - Introduction to R (Kirk)

Some R overview slides, https://r4ds.had.co.nz/
Installing packages
Data-types
Data manipulation, slicing
Strings manipulations
Introducing object oriented programming / S4 objects
Visualization tools
Bonus create FeaturePlot from Seurat in base ggplot
Bonus: run RSEM on Dana’s bam files if you are bored

1.5.2.2 Lecture 3 - Expression QC, normalisation and gene-level batch correction (Orr)

What CellRanger does for quality filtering
PBMC data
Normalisation methods https://www.nature.com/articles/nmeth.4292
Doublets, empty droplets, CellBender
Barcode swapping
Regression with technical covariates
What about imputation?

1.5.2.3 Lab 4 – Data wrangling for scRNAseq data (Dana)

Data structures and file formats for single-cell data
Quality control of cells and genes (doublets, ambient, empty drops)
Data exploration: violin plots…
Introducing Seurat object
Genes
House keeping genes
Mitochondrial genes
Filter
Normalize
Find variable genes
Scaling
Regression
Calculate a signature

1.5.2.4 Flash talks (1.5 hr, break into 2 groups of 13) placed into a dropbox

1 slide advertising or summarizing the poster. So you can introduce yourselves and we can get to know each other. No questions, 2 minutes. Two sessions, 15 people each.

1.5.3 Wednesday – Classes from 08:00 to 16:00

1.5.3.1 Lecture 4 (may start late Tuesday) - Identifying cell populations (Kirk)

Feature selection
Dimensionality reduction
Clustering and assigning identity (Louvain, NMF, topic models, variational autoencoder)
Differential expression tests

1.5.3.2 Lab 5 – Feature selection & Clustering analysis (Kirk)

Parameters and clustering
Comparison of feature selection methods

1.5.3.3 Lecture 5 - Batch effects correction (Orr)

Batch correction methods (regress out batch, scaling within batch, Seurat v3, MNN, Liger, Harmony, scvi, scgen)
Evaluation methods for batch correction (ARI, average silhouette width, kBET…)

1.5.3.4 Lab 6 - Correcting batch effects (Orr)

Comparison of batch correction methods, Seurat pancreas Use Seurat Wrappers?

1.5.4 Thursday – Classes from 08:00 to 16:00

Deciding on discussion topics for next day based on shared google doc.

1.5.4.1 Lecture 6 - Advanced topics (Kirk)

Pseudotime inference
Differential expression through pseudotime
Deep learning or spatial data depending on questionnaire (20ish min, autoencoder as nonlinear dimension reduction, scvi, what questions to ask to evaluate whether a more advanced model helps, how to decide it’s safe to use a method, tradeoffs between method complexity and interpretability)

1.5.4.2 Lab 7 - Functional and Pseudotime analysis (Orr)

Popular tools and packages for functional analysis (https://github.com/dynverse/dynmethods#list-of-included-methods)
Review concepts from papers
Comparison of pseudotime methods

1.5.4.3 Lecture 7 - Single-cell multi-omic technologies (Dana)

Introduction to other omic data types
Integrating scRNA-seq with other single-cell modalities (CITE, Perturb, ATAC, methylation…)

1.5.4.4 Lab 8 - Analysis of CITE-seq, scATAC-seq (Orr)

1.5.5 Friday – Classes from 08:00 to 16:00

Small group discussion on selected topics through hangouts . #### Lab 10 - small dataset for analysis and office hours focused on select topics (Dana)

For project on last day (plan for whole day), Dana will prepare datasets for 3 or more ish mut glioma tumors that they will download beforehand. The datasets may need to be subsampled to save time. Can do pseudotime, can run scvi, nmf. Groups of 3 students.

ANALYSIS OF SINGLE CELL RNA-SEQ DATA