Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

CellProfiler Integration

CellProfiler is the workhorse of this pipeline. It’s an open-source tool designed specifically for high-throughput image analysis in biology. If you’re new to CellProfiler, think of it as a flexible image processing engine that you configure using “pipeline” files (.cppipe).

This page explains how the pipeline uses CellProfiler and how to customize the integration for your own data.

Overview

The pipeline uses CellProfiler v4.2.8 for image analysis tasks:

CellProfiler Processes

Process Modules

All CellProfiler processes follow a similar pattern:

process CELLPROFILER_ILLUMCALC {
    container 'wave.seqera.io/cellprofiler/cellprofiler:4.2.8'

    input:
    tuple val(meta), path(images)
    path(cppipe)

    script:
    """
    # Generate load_data.csv
    generate_load_data_csv.py \\
        --pipeline_type illumcalc \\
        --channels ${meta.channels.join(',')} \\
        --output load_data.csv

    # Run CellProfiler
    cellprofiler \\
        -c -r \\
        -p ${cppipe} \\
        -o . \\
        --data-file=load_data.csv
    """
}

Key Processes

ProcessPurposeGroupingKey Outputs
ILLUMCALCCalculate illumination functionsPer plate (or plate+cycle).npy files
ILLUMAPPLYApply correctionsPer well or siteCorrected TIFF images
SEGCHECKSegmentation QCPer well (subsampled)PNG previews, CSV stats
PREPROCESSBarcode callingPer sitePreprocessed TIFF, CSV
COMBINEDANALYSISFinal segmentationPer siteMasks, overlays, CSV

Pipeline Files (.cppipe)

Structure

CellProfiler pipelines are text files (.cppipe) files defining:

Required Pipeline Files

FilePurposeInputsOutputsRequirements
painting_illumcalc.cppipeCalculate painting illuminationMulti-channel raw images.npy illumination functions
painting_illumapply.cppipeApply painting illuminationRaw images + illumination functionsCorrected TIFF images
painting_segcheck.cppipeSegmentation QCCorrected imagesSegmentation previews
barcoding_illumcalc.cppipeCalculate barcoding illuminationMulti-cycle raw imagesCycle-specific illumination functions
barcoding_illumapply.cppipeApply barcoding illuminationRaw cycle images + illumination functionsCorrected cycle images
barcoding_preprocess.cppipeBarcode callingCorrected cycle imagesBarcode-called images (requires plugins)callbarcodes and compensatecolors plugins
combinedanalysis.cppipeFinal analysisPainting + barcoding imagesSegmentation masks, measurementscallbarcodes plugin

Load Data CSV Generation

Purpose

CellProfiler requires load_data.csv files that specify:

Generation Script

The generate_load_data_csv.py script creates these files:

generate_load_data_csv.py \
    --pipeline_type illumcalc \
    --channels DAPI,GFP,RFP,Cy5,Cy3 \
    --frames 0,1,2,3 \
    --output load_data.csv

Pipeline Types

Different CellProfiler stages require different CSV formats:

1. illumcalc - Illumination Calculation

Metadata_PlateMetadata_WellMetadata_SiteMetadata_FrameFileName_DAPIPathName_DAPI...
P001A0110P001_A01_1_0_DAPI.tif/path/to/images...

2. illumapply - Illumination Correction

Metadata_PlateMetadata_WellMetadata_SiteMetadata_FrameFileName_OrigDAPIPathName_OrigDAPIFileName_IllumDAPIPathName_IllumDAPI...
P001A0110P001_A01_1_0_DAPI.tif/orig/P001_IllumDAPI.npy/illum/...

3. preprocess - Barcoding with Cycles

Metadata_PlateMetadata_WellMetadata_SiteMetadata_FrameMetadata_CycleFileName_Cycle1_Cy3PathName_Cycle1_Cy3...
P001A01101P001_A01_1_0_Cycle1_Cy3.tif/path/...

4. combined - Painting + Barcoding

Metadata_PlateMetadata_WellMetadata_SiteMetadata_FrameFileName_CorrDAPIPathName_CorrDAPIFileName_Cycle1_Cy3PathName_Cycle1_Cy3...
P001A0110P001_A01_1_CorrDAPI.tif/painting/P001_A01_1_0_Cycle1_Cy3.tif/barcoding/...

Execution Details

Command-Line Invocation

CellProfiler is run in headless mode:

cellprofiler \
    -c \                    # Run without GUI
    -r \                    # Run pipeline
    -p pipeline.cppipe \    # Pipeline file
    -o output_dir/ \        # Output directory
    --data-file=load_data.csv  # Input CSV

Plugin Loading

For processes requiring plugins, Nextflow stages plugins into the process based on the plugin path provided. Default plugins are loaded from https://github.com/CellProfiler/CellProfiler-plugins via raw github links but local or other sources for the plugins can be specified if required.

# Stage plugins in nf-pooled-cellpainting.nf
file(params.callbarcodes_plugin)

# Stage plugins as input into process
path plugins, stageAs: "plugins/"

# use plugins in process with cellprofiler
cellprofiler -c -r \\
    -p combinedanalysis_patched.cppipe \\
    -o . \\
    --data-file=load_data.csv \\
    --image-directory ./images/ \\
    --plugins-directory=./plugins/

Default URLs are configured in nextflow.config:

callbarcodes_plugin = "https://raw.githubusercontent.com/CellProfiler/CellProfiler-plugins/refs/heads/master/active_plugins/callbarcodes.py"
compensatecolors_plugin = "https://raw.githubusercontent.com/CellProfiler/CellProfiler-plugins/refs/heads/master/active_plugins/compensatecolors.py"

Output Organization

File Naming Conventions

CellProfiler outputs follow structured naming:

TypePatternExample
Corrected images{plate}_{well}_{site}_Corr{channel}.tifPlate1_A01_1_CorrDNA.tif
Illumination functions{plate}_Illum{channel}.npyPlate1_IllumDNA.npy
Preprocessed barcoding{plate}_{well}_{site}_Cycle{cycle}_{channel}.tifPlate1_A01_1_Cycle01_A.tif
Segmentation masks{plate}_{well}_{site}_Nuclei.tifPlate1_A01_1_Nuclei.tif

CSV Outputs

Feature measurements are saved as CSV:

Best Practices

  1. Validate pipelines: Test .cppipe files in CellProfiler GUI first

  2. Test on subset data first: Test your dataset on 1 well first, since well is the smallest unit we can use for the pipeline. If the pipeline works on 1 well, it should work on a full plate.

  3. Resource tuning: Profile memory usage and adjust allocation (to save time and cost)

  4. Plugin versioning: Pin plugin versions for reproducibility

  5. Output validation: Check output file counts match expectations