Execution Layer
Experimental - Not Reviewed
Content may be incomplete or inaccurate.
Purpose
The Execution Layer manages the actual runtime execution of StarryNight pipelines on computational infrastructure. This layer is primarily implemented within the Pipecraft package and orchestrated by the Conductor service, solving the critical problem of translating abstract workflow definitions into concrete executions across diverse environments—from local workstations to cloud platforms. By abstracting backend-specific details through Pipecraft's backend system, the Execution Layer enables researchers to focus on scientific workflows rather than infrastructure management, while maintaining the flexibility to leverage different computational resources as needed.
Responsibilities
- Translate Pipecraft graphs into backend-specific execution plans
- Manage container-based execution for reproducibility
- Handle resource allocation, parallelization, and failure recovery
- NOT responsible for: Defining workflows, implementing algorithms, composing modules, or making scientific decisions
Key Design Decisions
-
Backend Abstraction (Decision #10): The execution functionality is provided through Pipecraft's backend system
pipecraft/src/pipecraft/backend/
, offering a uniform interface to multiple execution backends. Currently implemented backends include Snakemake (fully functional) and AWS Batch (partial implementation), with extensibility for others. This abstraction ensures that pipeline definitions remain portable and that infrastructure choices can evolve without impacting scientific workflows. The translation from Pipecraft graphs to backend-specific formats happens within Pipecraft's backend implementations. -
Container-Based Execution (Decision #6): All computational work executes within containers (Docker/Singularity/Apptainer) to ensure reproducibility across environments. This decision addresses the critical challenge of scientific reproducibility by guaranteeing that analyses produce identical results regardless of the host system. Container specifications are managed at the module level but orchestrated by the Execution Layer.
Interfaces
Inputs
- Pipecraft compute graphs from the Pipeline Layer
- Execution configuration (backend selection, resource limits)
- Runtime parameters (parallelism settings, retry policies)
Outputs
- Execution status and progress information
- Workflow artifacts as defined by pipeline outputs
- Execution logs and metrics for debugging
Dependencies
- Internal dependencies: Pipeline Layer for workflow definitions (following upward-only rule)
- External dependencies: Workflow orchestrators (e.g., Snakemake), container runtimes
Patterns
The Execution Layer follows a standard pattern for backend abstraction and pipeline execution:
Backend Pattern
Each backend implementation pipecraft/src/pipecraft/backend/
follows this structure:
- Inherits from a base Backend class that defines the interface
- Implements
compile()
to translate Pipecraft graphs to backend-specific formats - Implements
run()
to execute the compiled workflow - Returns a run object for monitoring and log access
Execution Flow Pattern
The Execution Layer follows a four-phase pattern for translating abstract pipelines into concrete executions:
-
Backend Selection: The system instantiates the appropriate backend based on configuration, with each backend providing a uniform interface despite different underlying orchestrators.
-
Pipeline Compilation: Backends translate Pipecraft graphs into their native formats by iterating through pipeline nodes and generating backend-specific execution instructions.
-
Execution Management: The backend configures the runtime environment, manages resource allocation, and launches workflows with appropriate parameters.
-
Monitoring and Control: Standardized run objects provide interfaces for checking execution status, accessing logs, and controlling running workflows.
Integration Pattern
The Conductor service conductor/src/conductor/handlers/execute.py
orchestrates execution:
- Creates data configuration from project/job settings
- Instantiates the appropriate module or pipeline
- Creates a backend instance with the compiled pipeline
- Executes and tracks the run in the database
- Provides status updates through the API
Configuration Pattern
Execution configuration typically includes:
- Resource limits (cores, memory, time)
- Container platform selection
- Retry and failure handling policies
- Output and scratch directory paths
- Backend-specific options
This pattern-based approach ensures that new backends can be added without modifying existing code, and that execution details can evolve without breaking the abstraction.
Implementation Location
- Primary location:
pipecraft/src/pipecraft/backend/
- Secondary location:
conductor/src/conductor/handlers/execute.py
- Tests:
pipecraft/tests/backend/
See Also
- Previous: Pipeline Layer, Next: Configuration Layer