- Frequently Asked Questions
- Analyses
- Data
- Does JUMP contain X compound/gene?
- Where are the datasets specification?
- Why are some images with corresponding images but no downstream analysis?
- Why do some perturbations have so many replicates
- How were the profiles created?
- Do we we expect one JCP to have multiple targets?
- Do JCPs within either the CRISPR or ORF share the same gene?
- Web interfaces
Frequently Asked Questions and links to their answers. They are grouped based on whether they pertain to data, libraries or analyses.
Frequently Asked Questions
Analyses
How can I reproduce an environment to explore JUMP data?
(WIP) The easiest way to set things up will be installing from pip in your enviromnment of choice:
pip install jump-deps
Data
Does JUMP contain X compound/gene?
The easiest way to find out is querying your dataset using this web tool. Alternatively, you can explore the metadata tables on the datasets repository.
Where are the datasets specification?
The main resource to understand the technicalities of the JUMP datasets collection and assembly is on this repo.
Why are some images with corresponding images but no downstream analysis?
Some plates failed Quality Control (QC) but we kept them because they may be useful for developing QC methods.
Why do some perturbations have so many replicates
Most plates contain 16 negative control wells, while some have as many as 28 wells. One replicate of four of the compound positive controls are added to wells O23, O24, P23 and P24. The remaining wells contain ORF treatments, with a single replicate of each per plate map and with five replicate plates produced per plate map (issue).
How were the profiles created?
We used snakemake and pycytominer to generate these. The details can be found in this repo.
Do we we expect one JCP to have multiple targets?
Yes, there will be many with multiple targets. For instance, JCP2022_050797
(quinidine/quinine) has the targets KCNK1
and KCNN4
.
Two were considered to be two different compounds because they had different names and broad_sample
names. But after all the data cleanup steps, they ended up being the same. Hence two different entries.
Web interfaces
What is the source of the replicability metric?
These two files (ORF and CRISPR) contain the mAP and corrected p values for replicate retrieval. They won’t contain all ORF and CRISPR reagents because so of them were filtered out for qc reasons.
X_Feature: For each row, is the
Feature
value an average for all the cells in theMetadata_image
using the listedMask
? Or is it associated with a single cell in that image?Any
Feature
is the average of all cells and all replicates (typically four in total) for the specific mask and feature.How are
Statistic
andMedian
calculated for each row? Are they calculated in relation to the average of the “Feature” values for the negative controls in the same plate?Statistic
is the probability of a given distribution (four replicates) to occur relative to their negative controls (in the four plates, typically each replicate is in an independent plate).Median
is the median feature across all (~4) replicates. Each of these replicates’ value was in turn the mean of all the sites and cells in a given well.