Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Using Your Own Data: Run

Running the Pipeline with CLI

Once your inputs are ready, run the pipeline pointing to your files:

nextflow run broadinstitute/nf-pooled-cellpainting \
    --input samplesheet.csv \
    --barcodes barcodes.csv \
    --outdir results \
    --painting_illumcalc_cppipe your_painting_illumcalc_cppipe.cppipe \
    --painting_illumapply_cppipe your_painting_illumapply_cppipe.cppipe \
    --painting_segcheck_cppipe your_painting_segcheck_cppipe.cppipe \
    --barcoding_illumcalc_cppipe your_barcoding_illumcalc_cppipe.cppipe \
    --barcoding_illumapply_cppipe your_barcoding_illumapply_cppipe.cppipe \
    --barcoding_preprocess_cppipe your_barcoding_preprocess_cppipe.cppipe \
    --combinedanalysis_cppipe your_combinedanalysis_cppipe.cppipe \
    -profile docker

Running the Pipeline with Seqera Platform

Configuring the Pipeline in Seqera Platform

Navigate to LaunchpadAdd Pipeline.

Pipeline Settings

SettingValue
Namenf-pooled-cellpainting or a name describing your run
Pipeline to launchhttps://github.com/broadinstitute/nf-pooled-cellpainting
Revisiondev (for latest updates), main (for latest versioned code), or a specific commit
Compute environmentYour AWS Batch environment
Work directorys3://your-bucket/prefix/to/scratch/output
Config profiles(leave empty for a custom run)

Pipeline Parameters

In the Launchpad, select “Launch” for your pipeline.

In the “Run Parameters” tab, fill all of the required Input/Output options. You can manually enter each of the values in the “Input form view” or you can add the following parameters to the JSON or YAML in the “Params file view”. Note that all of the other parameters have default values but you may need to edit default values to match your dataset.

input: "s3://your-bucket/samplesheet.csv"
outdir: "s3://your-bucket/results"
barcodes: "s3://your-bucket/barcodes.csv"
painting_illumcalc_cppipe: "s3://your-bucket/pipelines/painting_illumcalc.cppipe"
painting_illumapply_cppipe: "s3://your-bucket/pipelines/painting_illumapply.cppipe"
painting_segcheck_cppipe: "s3://your-bucket/pipelines/painting_segcheck.cppipe"
barcoding_illumcalc_cppipe: "s3://your-bucket/pipelines/barcoding_illumcalc.cppipe"
barcoding_illumapply_cppipe: "s3://your-bucket/pipelines/barcoding_illumapply.cppipe"
barcoding_preprocess_cppipe: "s3://your-bucket/pipelines/barcoding_preprocess.cppipe"
combinedanalysis_cppipe: "s3://your-bucket/pipelines/combinedanalysis.cppipe"
Keep `qc_barcoding_passed: false` and `qc_painting_passed: false` for your first trigger of the pipeline. This will pause the pipeline after these important QC steps before the final steps are run.

Select “Launch”

Launching and Monitoring Runs

  1. Launch: Click Launch from the pipeline page

  2. Monitor: View real-time task execution in the Runs tab

  3. QC Review: Check outputs in the S3 bucket or via the Reports tab

  4. Resume: After QC review, click Resume (not Relaunch!) with updated parameters:

qc_painting_passed: true
qc_barcoding_passed: true

Cost Optimization Tips

  1. Use Spot Instances: 60-90% cost savings for fault-tolerant workloads

  2. Enable Fusion Snapshots: Automatically recover from spot interruptions

  3. Right-size Max CPUs: Start with 500-1000, increase based on queue times

  4. Use Appropriate Instance Types: Memory-optimized (r6id) for Combined Analysis; compute-optimized (c6id) for illumination steps

  5. Clean Up Work Directory: Periodically delete old work directories from S3

  6. Route Long Tasks to On-Demand: See below for avoiding spot reclaim losses on multi-hour tasks

Routing Long-Running Tasks to On-Demand Instances

Long-running tasks like FIJI_STITCHCROP (up to 4-6 hours) and CELLPROFILER_COMBINEDANALYSIS risk losing hours of work if spot instances are reclaimed. To avoid this:

  1. Create an on-demand compute environment in Seqera Platform (duplicate your spot environment, disable Fusion Snapshots since they’re unnecessary for on-demand)

  2. Route specific processes to the on-demand queue by adding to your Nextflow config:

process {
    withName: 'FIJI_STITCHCROP' {
        queue = '<on-demand-queue-name>'
    }
    withName: 'CELLPROFILER_COMBINEDANALYSIS' {
        queue = '<on-demand-queue-name>'
    }
}

The queue name is visible in your Seqera Platform compute environment under “Manual config attributes”.

Resource Requirements by Process

ProcessCPUMemoryNotes
CELLPROFILER_ILLUMCALC12 GBPer plate
CELLPROFILER_ILLUMAPPLY1-26 GBPer well/site
CELLPROFILER_PREPROCESS48 GBPer site
FIJI_STITCHCROP636 GBMemory-intensive
CELLPROFILER_COMBINEDANALYSIS412-32 GBMost demanding

To override defaults, add to your Nextflow config:

process {
    withName: 'CELLPROFILER_COMBINEDANALYSIS' {
        memory = '64.GB'
    }
}