Using JUMP data on Terra
Terra is a scalable and secure platform for biomedical researchers to access data, run analysis tools, and collaborate. It was co-developed by the Broad Institute of MIT and Harvard, Microsoft, and Verily. You can use Terra to explore JUMP Cell Painting data interactively via Jupyter notebooks in a cloud environment, without needing to set up a local computing environment.
Overview
The JUMP Cell Painting dataset includes 116k chemical and >15k genetic perturbations (cpg0016), split across 12 data-generating centers, using human U2OS osteosarcoma cells. All data is hosted on the Cell Painting Gallery on the Registry of Open Data on AWS.
The JUMP Hub tutorial scripts fetch profiles via HTTP from the Cell Painting Gallery, so no local data or cloud credentials are needed — they work anywhere you can run a Jupyter notebook, including Terra.
Prerequisites
- A Terra account with an active billing project.
Running a tutorial on Terra
Create or open a workspace on Terra.
Start a cloud environment from the workspace. Recommended minimum settings:
Setting Value CPUs 1 Disk Size 50 GB Memory 3.75 GB Pick a tutorial script from
scripts/and convert it to a notebook locally. For example, starting with Retrieve JUMP profiles:pip install jupytext jupytext --to notebook 11_retrieve_profiles.pyUpload the resulting
.ipynbto the Terra workspace Analyses tab.Add a cell at the top of the notebook to install dependencies:
!pip install jump_depsRun the notebook.
About the data
- Most data components (images, raw CellProfiler output, single-cell profiles, aggregated CellProfiler profiles) are available from 12 sources for the principal dataset. Each source corresponds to a unique data-generating center (except
source_7andsource_13, which were from the same center). - See jump-cellpainting/datasets for full details on available and planned data releases.
History
This guide was originally contributed by Nicole Deflaux (Verily) in March 2023 as part of the datasets repository and has been adapted for the JUMP Hub.