Architecture Flow in Action: Detailed Code Examples
Experimental Documentation
This is an experimental document that offers a very detailed view of the architecture flow. Some readers may find this level of detail overwhelming. If you're new to StarryNight, we recommend starting with the Architecture Overview and Practical Integration before diving into this detailed mapping.
This document provides concrete code examples showing how data flows through StarryNight's architectural layers. It complements the Architecture Overview by mapping the abstract sequence diagrams to actual code implementation, and builds on the foundational concepts from the Practical Integration walkthrough.
Purpose of This Document
While the architecture overview explains the conceptual flow and the practical integration document shows the overall structure of a pipeline, this document:
- Maps Diagrams to Code: Shows exactly how each arrow in the sequence diagrams maps to concrete code
- Demonstrates Transformations: Illustrates how data transforms at each step between layers
- Provides Implementation Details: Goes deeper into the technical implementation of each layer
- Shows Two-Phase Flow: Demonstrates how the Pipeline Composition and Runtime Execution phases work in practice
By studying this document, you'll gain a precise understanding of how StarryNight's architectural components interact in a real implementation.
Tracing a Single Pipeline Step
We'll trace a single pipeline step (CP illumination calculation) through the complete architecture flow, showing exactly how data transforms at each step between layers.
Pipeline Composition Phase
Config→Module: Configuration flows into module
# Configuration setup
data_config = DataConfig(
dataset_path=dataset_path,
storage_path=dataset_path,
workspace_path=workspace_path,
)
pcp_exp_init = PCPGenericInitConfig(
cp_nuclei_channel="DAPI",
# other parameters...
)
# Configuration flows into module
cp_calc_illum_load_data_mod = CPCalcIllumGenLoadDataModule.from_config(
data_config, pcp_experiment
)
What happens: Configuration parameters flow into module creation
Input → Output: Parameters (paths, channels) → Module instance
Module→Module: Generate compute graphs
# Inside from_config method (not visible in example)
# This happens within the module's initialization
compute_graph = ComputeGraph([container])
return cls(compute_graph)
What happens: Module internally generates compute graph with container specification
Input → Output: Configuration → Compute graph with inputs/outputs
Module→Pipeline: Pass module specifications
# Module pipe passed to backend
exec_backend = SnakeMakeBackend(
cp_calc_illum_load_data_mod.pipe,
backend_config,
exec_runs / "run003",
exec_mounts,
)
What happens: Module's compute graph becomes available to pipeline/execution
Input → Output: Module compute graph → Pipeline component
Pipeline→Execution: Submit workflow
# Alternative approach shows this more clearly
return module_list, Seq(
[
Parallel(
[
Seq([cp_illum_calc_loaddata.pipe, ...]),
# other parallel sequences
]
),
# sequential steps
]
)
# In step-by-step approach:
run = exec_backend.run()
What happens: Pipeline submits workflow for execution
Input → Output: Pipeline specification → Execution plan
Execution→Execution: Translate to Snakemake rules
# Inside SnakeMakeBackend.run() (not visible in example)
# Translates compute graph to Snakemake rules
What happens: Backend translates compute graph into Snakemake rules
Input → Output: Compute graph → Snakemake rules
Execution→Runtime: Schedule container execution
This step transitions us from the Pipeline Composition Phase to the Runtime Execution Phase.
# Container definition from modules/cp_illum_calc/calc_cp.py
Container(
name="cp_calc_illum_invoke_cp",
input_paths={
"cppipe_path": [...],
"load_data_path": [...],
},
output_paths={
"cp_illum_calc_dir": [...]
},
config=ContainerConfig(
image="ghrc.io/leoank/starrynight:dev",
cmd=["starrynight", "cp", "-p", spec.inputs[0].path, ...],
env={},
),
)
What happens: Snakemake executes rules in container environment
Input → Output: Snakemake rule → Container execution
Runtime Execution Phase
Runtime→CLI→Algorithm: Command Execution Flow
When the container executes, the CLI layer bridges between runtime containers and algorithm functions:
# Container definition invokes the starrynight CLI command
cmd=["starrynight", "cp", "-p", spec.inputs[0].path, "-l", spec.inputs[1].path,
"-o", spec.outputs[0].path]
When this container executes:
- The
starrynight
command invokes the main CLI entrypoint - The
cp
subcommand selects the specific command - The CLI parses arguments and validates paths
- The CLI then calls the corresponding algorithm function:
# Inside starrynight/cli/cp.py
@click.command("cp")
@click.option("-p", "--pipeline", required=True, type=click.Path(exists=True))
@click.option("-l", "--loaddata", required=True, type=click.Path(exists=True))
@click.option("-o", "--output-dir", required=True, type=click.Path())
def cp_command(pipeline, loaddata, output_dir):
"""Run CellProfiler on a pipeline with a loaddata file."""
from starrynight.algorithms.cellprofiler import run_cellprofiler
# Convert string paths to standardized path objects (simplified)
pipeline_path = AnyPath(pipeline)
loaddata_path = AnyPath(loaddata)
output_path = AnyPath(output_dir)
# CLI command translates parameters and calls algorithm function
run_cellprofiler(
pipeline_path=pipeline_path,
loaddata_path=loaddata_path,
output_dir=output_path
)
This in turn calls the pure algorithm function:
# Inside starrynight/algorithms/cellprofiler.py
def run_cellprofiler(pipeline_path, loaddata_path, output_dir):
"""Run CellProfiler with specified pipeline and load data."""
# Prepare environment and input files
prepare_input_files(loaddata_path)
# Execute core CellProfiler functionality
result = execute_cellprofiler_pipeline(pipeline_path, output_dir)
# Post-process results if needed
post_process_results(result, output_dir)
return result
What happens: Container command invokes CLI, which parses arguments and calls algorithm
Input → Output: Container command line → CLI argument parsing → Algorithm function call → Processing results
This three-layer approach (Container→CLI→Algorithm) provides several benefits:
- Algorithms remain pure functions without CLI or container dependencies
- CLI provides standardized interfaces and path handling
- Modules can compose different CLI commands into complex workflows
- The same algorithm can be invoked from different contexts (container, direct CLI, notebooks)
The CLI layer is the essential bridge that allows containerized execution to access the underlying algorithm functionality while maintaining clean separation of concerns.
What happens: Algorithm function executes core image processing logic
Input → Output: Function parameters → Processed data
Algorithm→CLI: Return results to CLI
# Continues from previous code block
def cp_command(pipeline, loaddata, output_dir):
# Call algorithm and get results
result = run_cellprofiler(
pipeline_path=pipeline,
loaddata_path=loaddata,
output_dir=output_dir
)
# CLI handles results (logging, exit code, etc.)
if result.success:
click.echo(f"CellProfiler execution successful. Output in {output_dir}")
return 0
else:
click.echo(f"CellProfiler execution failed: {result.error}")
return 1
What happens: Algorithm function returns results to CLI command
Input → Output: Algorithm result → CLI output/exit code
CLI→Runtime: CLI process completes
# Container execution completes when CLI process exits
# Exit code from CLI determines container success/failure
What happens: CLI process exits, container execution completes
Input → Output: CLI exit code → Container exit status
Runtime→Storage: Write results
# Container writes to output paths
output_paths={
"cp_illum_calc_dir": [workspace_path / "cp_illum_calc"]
}
What happens: Container processes execute CLI commands that write results
Input → Output: Container processing → Files on disk
Storage→Runtime: Read previous outputs
# Next module reads previous outputs
# (Not directly visible but implied in dependencies)
cp_calc_illum_cppipe_mod = CPCalcIllumGenCPPipeModule.from_config(
data_config, pcp_experiment
)
What happens: Next phase reads outputs from previous phase
Input → Output: Files from previous step → Input for next step
Flow Patterns in Three-Phase Execution
Each three-phase pattern (LoadData → CPipe → Invoke) demonstrates the complete flow through all architecture layers. These phases map to the two architectural phases as follows:
Pipeline Composition Phase steps in each CellProfiler phase:
- Config→Module: Configuration flows into module
- Module→Module: Generate compute graphs
- Module→Pipeline: Pass module specifications
- Pipeline→Execution: Submit workflow
- Execution→Execution: Translate to Snakemake rules
Runtime Execution Phase steps in each CellProfiler phase:
- Execution→Runtime: Schedule container execution
- Runtime→CLI: Invoke CLI commands
- CLI→Algorithm: Call algorithm functions
- Algorithm→Storage: Write results
The three CellProfiler-specific phases each execute this full cycle but with different inputs/outputs:
- LoadData Phase:
- Pipeline Composition: Configuration flows into module through to Snakemake rules
- Runtime Execution: Container executes, CLI generates LoadData CSV
- Result: CSV file written to disk
- CPipe Phase:
- Pipeline Composition: Same flow but with new module
- Runtime Execution: Container executes, reads LoadData CSV, CLI generates pipeline
- Result: Pipeline file written to disk
- Invoke Phase:
- Pipeline Composition: Same flow but with new module
- Runtime Execution: Container executes, reads both CSV and pipeline file, CLI invokes algorithm
- Result: Processed data written to disk
When using the pipeline composition approach shown in the "Pipeline Composition (Alternative Approach)" section in the Practical Integration document, this flow becomes more explicit since modules are composed in advance rather than executed one by one.
CLI Layer as the Bridge Between Container and Algorithm
The CLI layer serves as a critical bridge between the containerized execution environment and the pure algorithm functions:
# In container definition
cmd=["starrynight", "cp", "-p", "${inputs.cppipe_path}",
"-l", "${inputs.load_data_path}",
"-o", "${outputs.cp_illum_calc_dir}"]
# Inside CLI implementation (starrynight/cli/main.py)
@click.group()
def cli():
"""StarryNight CLI."""
pass
cli.add_command(cp.cp_command, name="cp")
# Other commands...
# Inside algorithm module (starrynight/algorithms/cellprofiler.py)
def run_cellprofiler(pipeline_path, loaddata_path, output_dir):
"""
Run CellProfiler with the specified pipeline and loaddata.
This function encapsulates the core image processing logic.
"""
# Algorithm implementation
What happens:
- Container executes the
starrynight cp
command with inputs/outputs - CLI parses arguments and provides a standardized interface to algorithms
- Algorithm functions contain pure implementation without CLI/container concerns
Benefits of this approach:
- Separation of concerns: Algorithms focus on core functionality without UI/execution details
- Testability: Pure algorithm functions can be tested independently from CLI/containers
- Flexibility: Same algorithms can be accessed through different interfaces (CLI, API, notebook)
- Composability: CLI commands can combine multiple algorithm functions in useful ways
- Containerization: CLI provides a standard entrypoint for container execution
This CLI layer pattern is consistent across all StarryNight modules, creating a clean separation between:
- Algorithm layer: Pure implementation of image processing functionality
- CLI layer: Command-line interfaces that parse arguments and call algorithms
- Module layer: Compute graph specifications that invoke CLI commands in containers
When extending StarryNight with new capabilities, maintaining this separation through well-defined CLI interfaces ensures that algorithms remain reusable across different execution contexts.