StarryNight Architecture

Experimental - Not Reviewed

Content may be incomplete or inaccurate.

StarryNight is a layered framework for scientific image processing that transforms raw microscopy data into quantitative measurements. Its architecture separates concerns across six distinct layers, enabling scalable, reproducible analysis of Optical Pooled Screening experiments.

System Overview

StarryNight processes terabytes of microscopy images through a pipeline that progresses from simple functions to complex workflows:

flowchart TB
        A[Algorithm Layer<br/>Standalone Python Functions<br/>starrynight/algorithms/]
        B[CLI Layer<br/>Command-line Access<br/>starrynight/cli/]
        C[Module Layer<br/>Specs & Compute Graphs<br/>starrynight/modules/]
        D[Pipeline Layer<br/>Workflow Composition<br/>starrynight/pipelines/]
        E[Execution Layer<br/>Backend Runtime<br/>pipecraft/backend/]

        A -->|exposed as| B
        B -->|invoked by| C
        C -->|composed into| D
        D -->|translated to| E

Note: The Configuration Layer (starrynight/experiments/) operates as a cross-cutting concern, providing experiment parameters and settings that influence behavior across all layers.

Layer Summary

Algorithm Layer (Foundation)

Standalone Python functions implementing core image processing logic. No dependencies on other StarryNight components. Organized into algorithm sets that handle specific pipeline stages.

Key characteristics:

Complete independence from other layers
Clear input/output contracts
Support for both local and cloud storage via cloudpathlib

CLI Layer (Direct Access)

Command-line interfaces wrapping algorithm functions. Uses Click to provide user-friendly access with parameter validation and path handling.

Key characteristics:

Direct algorithm imports
Consistent command structure
Automatic path normalization

Module Layer (Abstraction)

Standardized components combining specifications (via Bilayers) and compute graphs (via Pipecraft). Enables backend-agnostic execution and workflow composition.

Key characteristics:

Dual nature: specs define "what", compute graphs define "how"
Container-based execution
Three-function pattern: LoadData → Pipeline → Execution

Pipeline Layer (Composition)

Combines modules into complete workflows. Integrates with Pipecraft to create executable compute graphs with defined execution patterns.

Key characteristics:

Sequential and parallel execution blocks
Backend independence
End-to-end workflow definition

Execution Layer (Runtime)

Manages actual execution via backends like Snakemake. Handles resource allocation, dependency management, and parallel processing.

Key characteristics:

Backend configuration
Workflow translation
Container orchestration

Configuration Layer (Cross-cutting)

Provides experiment configuration and parameter inference. Unlike other layers, it operates orthogonally, influencing behavior across all layers.

Key characteristics:

Parameter inference from data
Experiment-specific settings
Module configuration generation

Key Design Principles

Separation of Concerns: Each layer has distinct responsibilities
Backend Independence: Define once, run anywhere
Progressive Enhancement: Each layer adds capabilities
Explicit Over Implicit: Clear contracts and dependencies
Composability: Components combine into larger workflows

Common Patterns

Three-Function Pattern (CellProfiler)

CellProfiler-based algorithm sets implement:

LoadData Generation: Create CSV files for image loading
Pipeline Generation: Generate .cppipe pipeline files
Execution: Run CellProfiler pipelines

Other algorithm types (indexing, inventory, quality control) follow different patterns suited to their purpose.

Specification vs Implementation

Specifications define interfaces and contracts
Implementations provide concrete execution
This separation enables multiple backends and inspection

Container-Based Execution

All processing runs in containers for:

Reproducibility across environments
Dependency isolation
Scalable cloud execution

Usage Flow

A typical StarryNight workflow:

Index Generation: Scan and catalog available images
Experiment Configuration: Define experimental parameters
Module Selection: Choose appropriate processing modules
Pipeline Composition: Connect modules into workflows
Execution: Run pipelines on compute infrastructure
Result Collection: Gather and organize outputs

Technology Stack

Core: Python with type hints
CLI: Click framework
Paths: cloudpathlib for storage abstraction
Specs: Bilayers for interface definition
Pipelines: Pipecraft for compute graphs
Execution: Snakemake for workflow orchestration
Containers: Docker/Singularity for isolation

Next Steps

Principles - Fundamental principles and patterns
Layer Documentation - See individual layer files (01-algorithm.md through 06-configuration.md)