3 Understanding Sequencing Raw Data

3.1 Class Environment

3.1.1 Getting into AWS Instance

## Example
ssh -i berlin.pem ubuntu@<PUBLIC IP ADDRESS> (e.g.34.219.254.245)

## Actual Command
ssh -i berlin.pem ubuntu@34.213.180.241

3.2 Shell and Unix commands

3.2.1 Common Linux Commands

3.2.1.1 Lab 1a

  • check the your present directory
pwd
  • check history
history
  • pipe history to grep to search for the cd command
history | grep cd
  • put history into a history.txt file
history > history.txt
  • make a directory called data
mkdir data
  • change into data directory
cd data
  • move history.txt file into data directory
mv ../history.txt ./
  • check manual page of wget command
man wget
  • redirect wget maunual page output into a file called wget.txt
man wget > wget.txt
  • return the lines that contain output in the wget.txt file
cat wget.txt | grep output
grep -i output wget.txt
  • Compress wget.txt file
gzip wget.txt
  • View Compressed file
cat wget.txt.qz
zcat wget.txt.qz
zcat wget.txt.qz | less

3.2.1.2 Docker Commands

Consistent compute enviornment to ensure all software that you need is on the machine and able to be used.

  • change directory to your user directory
  • run following command to start docker
## maybe take away the --rm so you can save the container for later
## run from your home directory
cd 
docker run --rm -it -v $PWD/Share:/Share -v $PWD:/mydir kdgosik/scellbern2019 bash

Explaination of commands

  - docker: command to run docker
  - run: asking docker to run a container
  - --rm: flag to remove the container when you exit from it
      - nothing will be saved from your session to access again later
      - this flag can be removed to keep container
  - -it: flag to run the container interactively
    - this will keep all session output displaying on the terminal
    - to stop container go to terminal and press Crtl+c
    -v $PWD/Share:/Share: map the share directory from AWS to Share inside docker container
    -v $PWD:/mydir: map your home directory to a directory inside docker container called home
  - kdgosik/scellbern2019: the image to run.  It will be the image into a container if not already built on your computer
    - [image link](https://hub.docker.com/r/kdgosik/scellbern2019)

3.3 File formats

  • bcl
  • fastq
  • bam
  • mtx, tsv
  • hdf5 (.h5, .h5ad)

3.3.1 View FASTQ Files

3.3.1.1 Viewing entire file

cat /Share/data/Teichmann_2i_2_2_2.fastq

3.3.1.2 Viewing first 10 lines

head /Share/data/Teichmann_2i_2_2_2.fastq

3.3.1.3 Stream Viewing with less command

less /Share/data/Teichmann_2i_2_2_2.fastq

3.3.2 View BAM Files

3.3.2.1 Viewing first 10 lines

samtools view /Share/data/pbmc_1k_protein_v3_possorted_genome_bam.bam | head

3.3.2.2 Stream Viewing with less command

samtools view /Share/data/pbmc_1k_protein_v3_possorted_genome_bam.bam | less

3.4 Public data repositories

3.4.1 Cellranger/10x

3.4.1.1 Lab 1b

10x PBMC data are hosted in https://s3-us-west-2.amazonaws.com/10x.files/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz

  • change directory into the data directory
  • get 10x PBMC data
  • unzip data
  • explore directory
  • explore files
mkdir data
wget https://s3-us-west-2.amazonaws.com/10x.files/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz -O data/pbmc3k_filtered_gene_bc_matrices.tar.gz
cd data; tar -xzf pbmc3k_filtered_gene_bc_matrices.tar.gz
cd ..

3.4.2 GEO

3.4.2.1 Lab 1c

Get GEO Data - ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE81nnn/GSE81905/matrix/GSE81905-GPL19057_series_matrix.txt.gz - ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE81nnn/GSE81905/matrix/GSE81905-GPL17021_series_matrix.txt.gz

  • make a directory for the files or use data directory
  • go into that directory
  • get files and place them in the directory
  • View files (try keeping in compressed format and view that way)

bash cd data wget ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE81nnn/GSE81905/matrix/GSE81905-GPL19057_series_matrix.txt.gz cd data; tar -xzf GSE81905-GPL19057_series_matrix.txt.gz cd ..

3.4.3 Single Cell Portal

3.4.3.1 Lab 1d

  • Get R2 fastq file from the Salk Institute study
  • Look at files

3.4.3.2 Lab 1e

  • Get Docker on your local computer for you to have
  • Explore Single Cell Portal
  • Explore GEO