Running the Pipeline with CLI¶
Once your inputs are ready, run the pipeline pointing to your files:
nextflow run broadinstitute/nf-pooled-cellpainting \
--input samplesheet.csv \
--barcodes barcodes.csv \
--outdir results \
--painting_illumcalc_cppipe your_painting_illumcalc_cppipe.cppipe \
--painting_illumapply_cppipe your_painting_illumapply_cppipe.cppipe \
--painting_segcheck_cppipe your_painting_segcheck_cppipe.cppipe \
--barcoding_illumcalc_cppipe your_barcoding_illumcalc_cppipe.cppipe \
--barcoding_illumapply_cppipe your_barcoding_illumapply_cppipe.cppipe \
--barcoding_preprocess_cppipe your_barcoding_preprocess_cppipe.cppipe \
--combinedanalysis_cppipe your_combinedanalysis_cppipe.cppipe \
-profile dockerRunning the Pipeline with Seqera Platform¶
Configuring the Pipeline in Seqera Platform¶
Navigate to Launchpad → Add Pipeline.
Pipeline Settings¶
| Setting | Value |
|---|---|
| Name | nf-pooled-cellpainting or a name describing your run |
| Pipeline to launch | https://github.com/broadinstitute/nf-pooled-cellpainting |
| Revision | dev (for latest updates), main (for latest versioned code), or a specific commit |
| Compute environment | Your AWS Batch environment |
| Work directory | s3://your-bucket/prefix/to/scratch/output |
| Config profiles | (leave empty for a custom run) |
Pipeline Parameters¶
In the Launchpad, select “Launch” for your pipeline.
In the “Run Parameters” tab, fill all of the required Input/Output options. You can manually enter each of the values in the “Input form view” or you can add the following parameters to the JSON or YAML in the “Params file view”. Note that all of the other parameters have default values but you may need to edit default values to match your dataset.
input: "s3://your-bucket/samplesheet.csv"
outdir: "s3://your-bucket/results"
barcodes: "s3://your-bucket/barcodes.csv"
painting_illumcalc_cppipe: "s3://your-bucket/pipelines/painting_illumcalc.cppipe"
painting_illumapply_cppipe: "s3://your-bucket/pipelines/painting_illumapply.cppipe"
painting_segcheck_cppipe: "s3://your-bucket/pipelines/painting_segcheck.cppipe"
barcoding_illumcalc_cppipe: "s3://your-bucket/pipelines/barcoding_illumcalc.cppipe"
barcoding_illumapply_cppipe: "s3://your-bucket/pipelines/barcoding_illumapply.cppipe"
barcoding_preprocess_cppipe: "s3://your-bucket/pipelines/barcoding_preprocess.cppipe"
combinedanalysis_cppipe: "s3://your-bucket/pipelines/combinedanalysis.cppipe"Keep `qc_barcoding_passed: false` and `qc_painting_passed: false` for your first trigger of the pipeline. This will pause the pipeline after these important QC steps before the final steps are run.Select “Launch”
Launching and Monitoring Runs¶
Launch: Click Launch from the pipeline page
Monitor: View real-time task execution in the Runs tab
QC Review: Check outputs in the S3 bucket or via the Reports tab
Resume: After QC review, click Resume (not Relaunch!) with updated parameters:
qc_painting_passed: true
qc_barcoding_passed: trueCost Optimization Tips¶
Use Spot Instances: 60-90% cost savings for fault-tolerant workloads
Enable Fusion Snapshots: Automatically recover from spot interruptions
Right-size Max CPUs: Start with 500-1000, increase based on queue times
Use Appropriate Instance Types: Memory-optimized (
r6id) for Combined Analysis; compute-optimized (c6id) for illumination stepsClean Up Work Directory: Periodically delete old work directories from S3
Route Long Tasks to On-Demand: See below for avoiding spot reclaim losses on multi-hour tasks
Routing Long-Running Tasks to On-Demand Instances¶
Long-running tasks like FIJI_STITCHCROP (up to 4-6 hours) and CELLPROFILER_COMBINEDANALYSIS risk losing hours of work if spot instances are reclaimed. To avoid this:
Create an on-demand compute environment in Seqera Platform (duplicate your spot environment, disable Fusion Snapshots since they’re unnecessary for on-demand)
Route specific processes to the on-demand queue by adding to your Nextflow config:
process {
withName: 'FIJI_STITCHCROP' {
queue = '<on-demand-queue-name>'
}
withName: 'CELLPROFILER_COMBINEDANALYSIS' {
queue = '<on-demand-queue-name>'
}
}The queue name is visible in your Seqera Platform compute environment under “Manual config attributes”.
Resource Requirements by Process¶
| Process | CPU | Memory | Notes |
|---|---|---|---|
| CELLPROFILER_ILLUMCALC | 1 | 2 GB | Per plate |
| CELLPROFILER_ILLUMAPPLY | 1-2 | 6 GB | Per well/site |
| CELLPROFILER_PREPROCESS | 4 | 8 GB | Per site |
| FIJI_STITCHCROP | 6 | 36 GB | Memory-intensive |
| CELLPROFILER_COMBINEDANALYSIS | 4 | 12-32 GB | Most demanding |
To override defaults, add to your Nextflow config:
process {
withName: 'CELLPROFILER_COMBINEDANALYSIS' {
memory = '64.GB'
}
}