3 Understanding Sequencing Raw Data

3.1 Class Environment

3.1.1 Getting into AWS Instance

There is a nice breakdown from another Physalia course on instructions for different operating systems and accessing AWS. It is called Connection to the Amazon EC2 service. This will help with connecting to the AWS instance to run docker.

3.2 Shell and Unix commands

3.2.1 Common Linux Commands

3.2.1.1 Lab 1a

  • check the your present directory
  • check history
  • pipe history to grep to search for the cd command
  • put history into a history.txt file
  • make a directory called data
  • change into data directory
  • move history.txt file into data directory
  • check manual page of wget command
  • redirect wget maunual page output into a file called wget.txt
  • return the lines that contain output in the wget.txt file
  • Compress wget.txt file
  • View Compressed file

3.2.1.2 Git Commands

Git is a distributed version-control system for tracking changes in source code during software development. It is designed for coordinating work among programmers, but it can be used to track changes in any set of files. Its goals include speed, data integrity, and support for distributed, non-linear workflows.

Go to your user directory and run the following command from git. This will create a directory of all the course material inside your user directory. After it is done cloning change directory into the 2020_scWorkshop directory where the course material is. The commands are below.

3.3 File formats

  • bcl
  • fastq
  • bam
  • mtx, tsv
  • hdf5 (.h5, .h5ad)

3.3.1 View FASTQ Files

3.3.1.1 Viewing entire file

3.3.1.2 Viewing first 10 lines

3.3.1.3 Stream Viewing with less command

3.3.2 View BAM Files

3.3.2.1 Viewing first 10 lines

3.3.2.2 Stream Viewing with less command

3.4 Public data repositories

3.4.3 Single Cell Portal

3.4.3.1 Lab 1d

  • Get R2 fastq file from the Salk Institute study
  • Look at files

3.4.3.2 Lab 1e

  • Get Docker on your local computer for you to have
  • Explore Single Cell Portal
  • Explore GEO

3.5 Docker Commands

Docker provides a consistent compute enviornment to ensure all software that you need is on the machine and able to be used. It will give you the version you need and help reduce software conflicts that may arise.

  • make sure you are in the directory from the cloned repository directory
  • run following command to start docker script

The full command inside the script is below. There is also an explaination of each part for your reference.

Explaination of commands