Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Samplesheet Dependencies & Channel Naming

This document outlines the critical dependencies between the input samplesheet, the Nextflow pipeline logic, and the Python scripts that generate load_data.csv files for CellProfiler. Correctly formatting your samplesheet and naming your files is essential for the pipeline to function correctly.

Samplesheet Requirements

The samplesheet is the single source of truth for experimental metadata. The pipeline expects specific columns to be present.

Required Columns

ColumnDescriptionCritical Dependency
batchBatch identifier (e.g., Batch1)Used for grouping images for illumination calculation.
platePlate identifier (e.g., Plate1)CRITICAL: Must match the plate name used in filenames for the Combined Analysis step.
wellWell identifier (e.g., A01)CRITICAL: Used to map images to metadata.
siteSite/Field number (e.g., 1)CRITICAL: Used to map images to metadata.
channelsComma-separated list of channelsCRITICAL: Used as column headers in load_data.csv. Must match the channel names parsed from filenames (see below).
armpainting or barcodingDetermines which subworkflow processes the image.
cycleCycle number (Barcoding only)CRITICAL: Used for grouping barcoding cycles.

Metadata Flow

  1. Ingestion: The samplesheet is read by main.nf.

  2. Channel Creation: Nextflow creates channels carrying [meta, image] tuples. meta contains all the columns above.

  3. Processing:

    • Illumination Calculation/Correction: Metadata (plate, channels, cycle) is passed explicitly to the Python script via CLI arguments.

    • Preprocessing & Combined Analysis: Metadata is implicitly derived from filenames in some legacy paths, but the modern implementation relies on the meta map passed from Nextflow.


Channel Naming Constraints

The Python script (bin/generate_load_data_csv.py) uses regular expressions to parse filenames and extract Channel and Cycle information. This is where most user errors occur.

Cell Painting Arm

Input Images (Raw)

Corrected Images (Intermediate)

Barcoding Arm

Input Images (Raw)

Preprocessing & Alignment


Combined Analysis Dependencies

The Combined Analysis step merges data from both arms. This is the most fragile step regarding naming.

The “Plate Name” Trap

The Python script groups files by (Plate, Well, Site).

The Constraint: The input files for combined analysis are generated by previous steps (IllumApply). These files are named using the metadata from those previous steps.

Channel Matching

The generate_load_data_csv.py script in combined mode uses regex to identify if a file is “Cell Painting” or “Barcoding” based on its filename pattern:

  1. Barcoding Pattern: Looks for Cycle(\d+).

    • Matches: ..._Cycle01_A.tiff

  2. Cell Painting Pattern: Looks for Corr(.+).

    • Matches: ..._CorrDNA.tiff

Impact: If you name a Cell Painting channel Cycle1 (e.g., CorrCycle1.tiff), the script might mistakenly try to parse it as a barcoding image because of the Cycle keyword.


Summary Checklist

Before running the pipeline: