3 Understanding Sequencing Raw Data
3.1 Class Environment
3.1.1 Getting into AWS Instance
## Example
ssh -i berlin.pem ubuntu@<PUBLIC IP ADDRESS> (e.g.34.219.254.245)
## Actual Command
ssh -i berlin.pem ubuntu@34.213.180.241
3.2 Shell and Unix commands
3.2.1 Common Linux Commands
3.2.1.1 Lab 1a
- check the your present directory
pwd
- check history
history
- pipe history to grep to search for the cd command
history | grep cd
- put history into a history.txt file
history > history.txt
- make a directory called data
mkdir data
- change into data directory
cd data
- move history.txt file into data directory
mv ../history.txt ./
- check manual page of wget command
man wget
- redirect wget maunual page output into a file called wget.txt
man wget > wget.txt
- return the lines that contain output in the wget.txt file
cat wget.txt | grep output
grep -i output wget.txt
- Compress wget.txt file
gzip wget.txt
- View Compressed file
cat wget.txt.qz
zcat wget.txt.qz
zcat wget.txt.qz | less
3.2.1.2 Docker Commands
Consistent compute enviornment to ensure all software that you need is on the machine and able to be used.
- change directory to your user directory
- run following command to start docker
## maybe take away the --rm so you can save the container for later
## run from your home directory
cd
docker run --rm -it -v $PWD/Share:/Share -v $PWD:/mydir kdgosik/scellbern2019 bash
Explaination of commands
- docker: command to run docker
- run: asking docker to run a container
- --rm: flag to remove the container when you exit from it
- nothing will be saved from your session to access again later
- this flag can be removed to keep container
- -it: flag to run the container interactively
- this will keep all session output displaying on the terminal
- to stop container go to terminal and press Crtl+c
-v $PWD/Share:/Share: map the share directory from AWS to Share inside docker container
-v $PWD:/mydir: map your home directory to a directory inside docker container called home
- kdgosik/scellbern2019: the image to run. It will be the image into a container if not already built on your computer
- [image link](https://hub.docker.com/r/kdgosik/scellbern2019)
3.3 File formats
- bcl
- fastq
- bam
- mtx, tsv
- hdf5 (.h5, .h5ad)
3.3.1 View FASTQ Files
3.3.1.1 Viewing entire file
cat /Share/data/Teichmann_2i_2_2_2.fastq
3.3.1.2 Viewing first 10 lines
head /Share/data/Teichmann_2i_2_2_2.fastq
3.3.1.3 Stream Viewing with less command
less /Share/data/Teichmann_2i_2_2_2.fastq
3.3.2 View BAM Files
3.3.2.1 Viewing first 10 lines
samtools view /Share/data/pbmc_1k_protein_v3_possorted_genome_bam.bam | head
3.3.2.2 Stream Viewing with less command
samtools view /Share/data/pbmc_1k_protein_v3_possorted_genome_bam.bam | less
3.4 Public data repositories
3.4.1 Cellranger/10x
3.4.1.1 Lab 1b
10x PBMC data are hosted in https://s3-us-west-2.amazonaws.com/10x.files/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz
- change directory into the data directory
- get 10x PBMC data
- unzip data
- explore directory
- explore files
mkdir data
wget https://s3-us-west-2.amazonaws.com/10x.files/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz -O data/pbmc3k_filtered_gene_bc_matrices.tar.gz
cd data; tar -xzf pbmc3k_filtered_gene_bc_matrices.tar.gz
cd ..
3.4.2 GEO
3.4.2.1 Lab 1c
Get GEO Data - ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE81nnn/GSE81905/matrix/GSE81905-GPL19057_series_matrix.txt.gz - ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE81nnn/GSE81905/matrix/GSE81905-GPL17021_series_matrix.txt.gz
- make a directory for the files or use data directory
- go into that directory
- get files and place them in the directory
- View files (try keeping in compressed format and view that way)
bash cd data wget ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE81nnn/GSE81905/matrix/GSE81905-GPL19057_series_matrix.txt.gz cd data; tar -xzf GSE81905-GPL19057_series_matrix.txt.gz cd ..
3.4.3 Single Cell Portal
- https://portals.broadinstitute.org/single_cell
- Study: Salk Institute - Single-cell Methylome Sequencing Identifies Distinct Neuronal Populations in Mouse Frontal Cortex
3.4.3.1 Lab 1d
- Get R2 fastq file from the Salk Institute study
- Look at files
3.4.3.2 Lab 1e
- Get Docker on your local computer for you to have
- Explore Single Cell Portal
- Explore GEO