Core Concepts

This guide explains the fundamental concepts behind the StarryNight platform.

System Overview

StarryNight is a comprehensive platform for processing, analyzing, and managing optical pooled screening (OPS) image data, with particular focus on Cell Painting and sequencing-based assays.

For a detailed description of the platform's components and architecture, see the Architecture Overview.

Data Organization and Parsing

StarryNight provides flexibility in data organization through its inventory and index system, which extracts metadata from file paths using configurable path parsers.

Inventory and Index Concepts

The foundation of data organization in StarryNight consists of two key concepts:

Inventory: A catalog of all files in a dataset

Contains basic file information: path, name, extension
Created by scanning a data directory recursively
Stored as a Parquet file for efficient querying

Index: Structured metadata extracted from file paths

Contains rich metadata: dataset, batch, plate, well, site, channel info
Created by parsing file paths using a grammar-based parser
Enables sophisticated filtering and selection of images
Stored as a structured Parquet file

Directory Structure

StarryNight uses a standardized workspace structure for processing:

workspace/
├── inventory/                  # File inventory
│   └── inventory.parquet       # Master inventory file
├── index/                      # Structured metadata
│   └── index.parquet           # Index file with extracted metadata
├── cellprofiler/               # CellProfiler-related files
│   ├── loaddata/               # CSV files for loading images
│   └── cppipe/                 # Pipeline files
├── illum/                      # Illumination correction files
├── aligned/                    # Aligned images
└── results/                    # Analysis results

This workspace structure is used for processing results, but the source data can follow various organization patterns, as long as the path parser can interpret them.

Path Parsing System

StarryNight uses a grammar-based path parsing system that:

Takes file paths from the inventory
Applies grammar rules to extract structured metadata
Creates index records with rich, queryable information

The default parser ("vincent") expects paths like:

[dataset]/Source[source_id]/Batch[batch_id]/images/[plate_id]/[experiment_id]/Well[well_id]_Point[site_id]_[index]_Channel[channels]_Seq[sequence].ome.tiff

Example:

starrynight_example/Source1/Batch1/images/Plate1/20X_CP_Plate1_20240319_122800_179/WellA2_PointA2_0000_ChannelPhalloAF750,ZO1-AF488,DAPI_Seq1025.ome.tiff

Flexible Data Organization

The path parsing approach provides significant flexibility:

Dataset Structure: Your raw data can follow different organization patterns
File Naming: Different file naming conventions can be supported
Customization: Custom parsers can be created for specific needs

The parsing system extracts key metadata including:

Dataset, batch, and plate identifiers
Well and site information
Channel details
Sequence/cycle information for sequence-based screens
Other experiment-specific metadata

See Parser Configuration for details on customizing the parser for your own data organization.

Workflow Concepts

Basic Workflow

A typical StarryNight workflow involves:

Inventory Generation: Creating a catalog of all image files (one-time step)
Index Generation: Extracting metadata from file paths (one-time step)
Module Processing: For each processing module (e.g., illumination correction):
- Generate LoadData: Creating CellProfiler LoadData CSV files
- Generate Pipeline: Creating CellProfiler pipeline (.cppipe) files
- Execute Pipeline: Running CellProfiler with the generated files

Module System

StarryNight is built around a modular architecture where each module represents a specific processing task (illumination correction, alignment, etc.). Modules follow a consistent three-step pattern:

Load Data Generation: Creating input specifications for processing
Pipeline Generation: Creating processing pipeline definitions
Pipeline Execution: Running the defined pipeline on the data

Modules can be used independently via CLI or composed into complete workflows through PipeCraft.

Processing Approaches

StarryNight supports different processing approaches:

CLI-based: Direct command-line execution of individual modules
Pipeline-based: Using PipeCraft to define and automate complete workflows
UI-based: Using the Canvas interface for intuitive configuration and execution

Key Abstractions

Inventory and Index

These are the foundation of all StarryNight workflows:

Inventory: A catalog of all files in a dataset
Index: Structured metadata extracted from file paths

Processing Modules

Each module handles a specific image processing task:

Illumination Correction: Normalizes uneven illumination
Alignment: Registers images across channels/cycles
Preprocessing: Applies filters and quality control
Cell Painting Analysis: Cell segmentation and features
Sequencing Analysis: Processes sequencing-based images

Next Steps

Learn about all Processing Modules