Summary and Schedule

The best way to learn how to program is to do something useful, so this introduction to Python is built around a common scientific task: data analysis.

Scenario: A Miracle Arthritis Inflammation Cure

Our imaginary colleague “Dr. Maverick” has invented a new miracle drug that promises to cure arthritis inflammation flare-ups after only 3 weeks since initially taking the medication! Naturally, we wish to see the clinical trial data, and after months of asking for the data they have finally provided us with a CSV spreadsheet containing the clinical trial data.

The CSV file contains the number of inflammation flare-ups per day for the 60 patients in the initial clinical trial, with the trial lasting 40 days. Each row corresponds to a patient, and each column corresponds to a day in the trial. Once a patient has their first inflammation flare-up they take the medication and wait a few weeks for it to take effect and reduce flare-ups.

To see how effective the treatment is we would like to:

Calculate the average inflammation per day across all patients.
Plot the result to discuss and share with colleagues.

3-step flowchart shows inflammation data records for patients moving to the Analysis step where a heat map of provided data is generated moving to the Conclusion step that asks the question, How does the medication affect patients?

Data Format

The data sets are stored in comma-separated values (CSV) format:

each row holds information for a single patient,
columns represent successive days.

The first three rows of our first file look like this:

0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0
0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1
0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1

Each number represents the number of inflammation bouts that a particular patient experienced on a given day.

For example, value “6” at row 3 column 7 of the data set above means that the third patient was experiencing inflammation six times on the seventh day of the clinical study.

In order to analyze this data and report to our colleagues, we’ll have to learn a little bit about programming.

Prerequisites

You need to understand the concepts of files and directories and how to start a Python interpreter before tackling this lesson. This lesson sometimes references Jupyter Notebook although you can use any Python interpreter mentioned in the Setup.

The commands in this lesson pertain to any officially supported Python version, currently Python 3.8+. Newer versions usually have better error printouts, so using newer Python versions is recommend if possible.

Setup Instructions

Download files required for the lesson

00h 00m

1. Python Fundamentals

What basic data types can I work with in Python?
How can I create a new variable in Python?
How do I use a function?
Can I change the value associated with a variable after I create it?

00h 30m

2. Analyzing Patient Data

How can I process tabular data files in Python?

01h 30m

3. Visualizing Tabular Data

How can I visualize tabular data in Python?
How can I group several plots together?

02h 20m

4. Storing Multiple Values in Lists

How can I store many values together?

03h 05m

5. Repeating Actions with Loops

How can I do the same operations on many different values?

03h 35m

6. Analyzing Data from Multiple Files

How can I do the same operations on many different files?

03h 55m

7. Making Choices

How can my programs do different things based on data values?

04h 25m

8. Creating Functions

How can I define new functions?
What’s the difference between defining and calling a function?
What happens when I call a function?

04h 55m

9. Errors and Exceptions

How does Python report errors?
How can I handle errors in Python programs?

05h 25m

10. Defensive Programming

How can I make my programs more reliable?

06h 05m

11. Debugging

How can I debug my program?

06h 55m

12. Command-Line Programs

How can I write Python programs that will work like Unix command-line tools?

07h 25m

Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.

This lesson is designed to be run on a personal computer. All of the software and data used in this lesson are freely available online, and instructions on how to obtain them are provided below.

Setup

To participate in this lesson, we will use Colab notebooks. Colab is a cloud-based environment where you can run scripting languages such as Python, create small segments of code, annotate them with notes and rich text, and return to your work later.

We will need to import our own resources into Colab, such as our simulated data. In order to do this, we will use Google drive. Because we will need to grant Colab access to our Google Drive data, we strongly recommend creating an ad-hoc (“throwaway”) Gmail account. That way, we do not risk sharing or altering any sensitive data that we might have on our Broad Institute Google accounts.

Obtain lesson materials

Download python-novice-inflammation-data.zip
Unzip the file, which will create a directory called data.
Create a folder called swc-python in your computer’s home directory.
Move the data folder into swc-python.
Log into Google Drive using the Gmail account you created for this course. You may want to use a private browsing window to do this, if you usually use your Broad account when logging in to Drive.
Upload the lesson content by clicking “+ New” followed by “Folder upload.” Select and upload the swc-python folder you just created.

You should see a swc-python folder in “My Drive”, with a subfolder called data.

Python interfaces

To start working with Python, we need to launch a program that will interpret and execute our Python commands. For our purposes today, we will use Colab, so you do not need to read further right now. However, if you want to do more with Python, you have a variety of options. It is a good idea to try out several of these interfaces to get a sense of which you prefer.

Jupyter Notebook

A Jupyter Notebook provides a browser-based interface for working with Python. Colab is one example of this. If you install Anaconda, you can launch a notebook in two ways:

Anaconda Navigator

Launch Anaconda Navigator. It might ask you if you’d like to send anonymized usage information to Anaconda developers: Make your choice and click “Ok, and don’t show again” button.
Find the “Notebook” tab and click on the “Launch” button: Anaconda will open a new browser window or tab with a Notebook Dashboard showing you the contents of your Home (or User) folder.
Navigate to the data directory by clicking on the directory names leading to it: Desktop, swc-python, then data:
Launch the notebook by clicking on the “New” button and then selecting “Python 3”:

Command line (Terminal)

1. Navigate to the data directory:

Unix shell

If you’re using a Unix shell application, such as Terminal app in macOS, Console or Terminal in Linux, or Git Bash on Windows, execute the following command:

BASH

cd ~/Desktop/swc-python/data

Command Prompt (Windows)

On Windows, you can use its native Command Prompt program. The easiest way to start it up is pressing Windows Logo Key+R, entering cmd, and hitting Return. In the Command Prompt, use the following command to navigate to the data folder:

cd /D %userprofile%\Desktop\swc-python\data

2. Start Jupyter server

Unix shell

BASH

jupyter notebook

Command Prompt (Windows)

python -m notebook

3. Launch the notebook by clicking on the “New” button on the right and selecting “Python 3” from the drop-down menu:

IPython interpreter

IPython is an alternative solution situated somewhere in between the plain-vanilla Python interpreter and Jupyter Notebook. It provides an interactive command-line based interpreter with various convenience features and commands. You should have IPython on your system if you installed Anaconda.

To start using IPython, execute:

ipython

plain-vanilla Python interpreter

To launch a plain-vanilla Python interpreter, execute:

python

If you are using Git Bash on Windows, you have to call Python via winpty:

winpty python

Summary and Schedule

Scenario: A Miracle Arthritis Inflammation Cure

Data Format

Prerequisites

Setup

Obtain lesson materials

Further reading: colab alternatives

Python interfaces

Jupyter Notebook

Anaconda Navigator

Command line (Terminal)

Unix shell

BASH

Command Prompt (Windows)

Unix shell

BASH

Command Prompt (Windows)

IPython interpreter

plain-vanilla Python interpreter