Creating a Project Manifest
Have custom-processed JUMP profiles? Here’s how to share them with others.
Prerequisites
- Profiles processed using jump-profiling-recipe
- A GitHub repository for your project
- AWS CLI configured (for S3 upload) or appropriate CLI for your cloud provider
jqandcurlinstalled
Overview
You will create a manifest file that documents your processed profiles:
[
{
"subset": "compound_no_source7",
"url": "https://cellpainting-gallery.s3.amazonaws.com/cpg0042-chandrasekaran-jump/source_all/workspace/profiles/compound_no_source7/v1.0/profiles_var_mad_int_featselect_harmony.parquet",
"recipe_permalink": "https://github.com/broadinstitute/jump-profiling-recipe/tree/v0.6.0",
"config_permalink": "https://github.com/broadinstitute/2025_jump_addon_orchestrator/blob/a15dedb35383cb342cd010106615f99939178126/1.convert/input/compound_no_source7.json",
"etag": ""
},
{
"subset": "compound_no_source7_interpretable",
"url": "https://cellpainting-gallery.s3.amazonaws.com/cpg0042-chandrasekaran-jump/source_all/workspace/profiles/compound_no_source7/v1.0/profiles_var_mad_int_featselect.parquet",
"recipe_permalink": "https://github.com/broadinstitute/jump-profiling-recipe/tree/v0.6.0",
"config_permalink": "https://github.com/broadinstitute/2025_jump_addon_orchestrator/blob/a15dedb35383cb342cd010106615f99939178126/1.convert/input/compound_no_source7.json",
"etag": ""
}
]This manifest provides:
- Centralized profile registry - All processed profile sets in one place
- Provenance tracking - Recipe version and config file URLs enable reproducibility (Note: versioning of input files to the recipe would be needed for complete reproducibility, but that is outside the current system’s scope)
- Standardized paths - URLs follow the Cell Painting Gallery folder structure convention:
source_all/workspace/profiles_assembled/- Standard JUMP dataset path structure. Thesource_allis typically an institution identifier and should be present even if data is from a single source. While you may store data elsewhere, we recommend following this structure for compatibility.subset/- Data description (compound_no_source7, orf_combined, crispr, etc.)version/- Dataset version (v1.0, v1.1, v2.0, etc.)pipeline_filename.parquet- Filename preserves the pipeline string (e.g.,profiles_var_mad_int_featselect_harmony.parquet)
Step-by-Step Guide
We’ll use the 2024_Chandrasekaran_Production project as an example, specifically the compound_no_source7_interpretable subset.
Step 1: Define your dataset parameters
SUBSET="compound_no_source7" # Descriptive name for this data subset
VERSION="v1.0" # Dataset version
PROFILES_FILE="profiles_var_mad_int_featselect_harmony.parquet" # Final processed profiles
INTERPRETABLE_PROFILES_FILE="profiles_var_mad_int_featselect.parquet" # Interpretable profilesStep 2: Upload your profiles to storage
This example shows uploading to S3, but adapt the commands for your storage location.
Note for Cell Painting Gallery uploads: Please follow the contribution guidelines which will require creating a unique prefix (e.g., cpg0042-chandrasekaran-jump).
aws s3 cp /path/to/${INTERPRETABLE_PROFILES_FILE} \
s3://cellpainting-gallery/cpg0042-chandrasekaran-jump/source_all/workspace/profiles_assembled/${SUBSET}/${VERSION}/${INTERPRETABLE_PROFILES_FILE}
# Verify upload succeeded
aws s3 ls s3://cellpainting-gallery/cpg0042-chandrasekaran-jump/source_all/workspace/profiles/${SUBSET}/${VERSION}/ --human-readableStep 3: Create the manifest file
In your project repository, create manifests/profile_index.json:
[
{
"subset": "compound_no_source7",
"url": "https://cellpainting-gallery.s3.amazonaws.com/cpg0042-chandrasekaran-jump/source_all/workspace/profiles/compound_no_source7/v1.0/profiles_var_mad_int_featselect_harmony.parquet",
"recipe_permalink": "https://github.com/broadinstitute/jump-profiling-recipe/tree/v0.6.0",
"config_permalink": "https://github.com/broadinstitute/2025_jump_addon_orchestrator/blob/a15dedb35383cb342cd010106615f99939178126/1.convert/input/compound_no_source7.json",
"etag": ""
}
]Note on recipe versioning:
- If using a tagged version (e.g.,
v0.6.0), use the tag URL:https://github.com/broadinstitute/jump-profiling-recipe/tree/v0.6.0 - If using an untagged version, use the commit hash:
https://github.com/broadinstitute/jump-profiling-recipe/tree/522aa81cad73d5776f62745fd0cd19336d4cfff3 - If using your own fork, point to your fork instead
- The goal is to provide a permanent link to the exact recipe version used
Step 5: Commit and push your manifest
git add manifests/profile_index.json
git commit -m "Add profile manifest"
git pushUsing Your Manifest
Your profiles are now documented and ready to share! See scripts/11_retrieve_profiles.py for an example of how to consume manifest files.
Reference Examples
See these projects for reference implementations: - Main JUMP datasets - 2024_Chandrasekaran_Production