Creating a Project Manifest
Have custom-processed JUMP profiles? Here’s how to share them with others.
Prerequisites
- Profiles processed using jump-profiling-recipe
- A GitHub repository for your project
- AWS CLI configured (for S3 upload) or appropriate CLI for your cloud provider
jq
andcurl
installed
Overview
You will create a manifest file that documents your processed profiles:
[
{
"subset": "compound_no_source7",
"url": "https://cellpainting-gallery.s3.amazonaws.com/cpg0042-chandrasekaran-jump/source_all/workspace/profiles/compound_no_source7/v1.0/profiles_var_mad_int_featselect_harmony.parquet",
"recipe_permalink": "https://github.com/broadinstitute/jump-profiling-recipe/tree/v0.6.0",
"config_permalink": "https://github.com/broadinstitute/2025_jump_addon_orchestrator/blob/a15dedb35383cb342cd010106615f99939178126/1.convert/input/compound_no_source7.json",
"etag": ""
},
{
"subset": "compound_no_source7_interpretable",
"url": "https://cellpainting-gallery.s3.amazonaws.com/cpg0042-chandrasekaran-jump/source_all/workspace/profiles/compound_no_source7/v1.0/profiles_var_mad_int_featselect.parquet",
"recipe_permalink": "https://github.com/broadinstitute/jump-profiling-recipe/tree/v0.6.0",
"config_permalink": "https://github.com/broadinstitute/2025_jump_addon_orchestrator/blob/a15dedb35383cb342cd010106615f99939178126/1.convert/input/compound_no_source7.json",
"etag": ""
}
]
This manifest provides:
- Centralized profile registry - All processed profile sets in one place
- Provenance tracking - Recipe version and config file URLs enable reproducibility (Note: versioning of input files to the recipe would be needed for complete reproducibility, but that is outside the current system’s scope)
- Standardized paths - URLs follow the Cell Painting Gallery folder structure convention:
source_all/workspace/profiles_assembled/
- Standard JUMP dataset path structure. Thesource_all
is typically an institution identifier and should be present even if data is from a single source. While you may store data elsewhere, we recommend following this structure for compatibility.subset/
- Data description (compound_no_source7, orf_combined, crispr, etc.)version/
- Dataset version (v1.0, v1.1, v2.0, etc.)pipeline_filename.parquet
- Filename preserves the pipeline string (e.g.,profiles_var_mad_int_featselect_harmony.parquet
)
Step-by-Step Guide
We’ll use the 2024_Chandrasekaran_Production project as an example, specifically the compound_no_source7_interpretable
subset.
Step 1: Define your dataset parameters
SUBSET="compound_no_source7" # Descriptive name for this data subset
VERSION="v1.0" # Dataset version
PROFILES_FILE="profiles_var_mad_int_featselect_harmony.parquet" # Final processed profiles
INTERPRETABLE_PROFILES_FILE="profiles_var_mad_int_featselect.parquet" # Interpretable profiles
Step 2: Upload your profiles to storage
This example shows uploading to S3, but adapt the commands for your storage location.
Note for Cell Painting Gallery uploads: Please follow the contribution guidelines which will require creating a unique prefix (e.g., cpg0042-chandrasekaran-jump
).
aws s3 cp /path/to/${INTERPRETABLE_PROFILES_FILE} \
${SUBSET}/${VERSION}/${INTERPRETABLE_PROFILES_FILE}
s3://cellpainting-gallery/cpg0042-chandrasekaran-jump/source_all/workspace/profiles_assembled/
# Verify upload succeeded
aws s3 ls s3://cellpainting-gallery/cpg0042-chandrasekaran-jump/source_all/workspace/profiles/${SUBSET}/${VERSION}/ --human-readable
Step 3: Create the manifest file
In your project repository, create manifests/profile_index.json
:
[
{
"subset": "compound_no_source7",
"url": "https://cellpainting-gallery.s3.amazonaws.com/cpg0042-chandrasekaran-jump/source_all/workspace/profiles/compound_no_source7/v1.0/profiles_var_mad_int_featselect_harmony.parquet",
"recipe_permalink": "https://github.com/broadinstitute/jump-profiling-recipe/tree/v0.6.0",
"config_permalink": "https://github.com/broadinstitute/2025_jump_addon_orchestrator/blob/a15dedb35383cb342cd010106615f99939178126/1.convert/input/compound_no_source7.json",
"etag": ""
}
]
Note on recipe versioning:
- If using a tagged version (e.g.,
v0.6.0
), use the tag URL:https://github.com/broadinstitute/jump-profiling-recipe/tree/v0.6.0
- If using an untagged version, use the commit hash:
https://github.com/broadinstitute/jump-profiling-recipe/tree/522aa81cad73d5776f62745fd0cd19336d4cfff3
- If using your own fork, point to your fork instead
- The goal is to provide a permanent link to the exact recipe version used
Step 5: Commit and push your manifest
git add manifests/profile_index.json
git commit -m "Add profile manifest"
git push
Using Your Manifest
Your profiles are now documented and ready to share! See scripts/11_retrieve_profiles.py
for an example of how to consume manifest files.
Reference Examples
See these projects for reference implementations: - Main JUMP datasets - 2024_Chandrasekaran_Production