Skip to content

Modules Design Principles and Assumptions⚓︎

WorkFlow Launcher is responsible for preparing the required workflow WDLs, inputs and options for Cromwell in a large scale. This work involves in inputs validation, pipeline WDL orchestration and Cromwell workflow management. Similar to other WFL modules, the aou-arrays module takes advantage of the workload concept in order to manage workflows efficiently.

In general, WFL classify all workloads into 2 categories: continuous and fixed. For instance, aou-arrays module implements arrays workload as a continuous workload, which means all samples are coming in like a continuous stream, and WFL does not make any assumption of how many samples will be in the workload or how to group the samples together: it hands off the workload creation and starting process to its caller. wgs module implements External Whole Genome workloads as a discrete workload that WFL has full knowledge about the number and properties of the samples it's going to process, and the samples can be grouped into batches (workloads) by a set of properties.

To learn more about the details of each module, please check their own sections in this documentation.

Create a workload⚓︎

Defining a workload type usually requires these top-level parameters.

Parameter Type Required
executor URL
output URL prefix
pipeline pipeline
project text
common object
input URL prefix
items object

The parameters are used this way.

  • The executor URL specifies the Cromwell instance or other execution engine to service the workload.
  • The output URL prefix specifies the path you'd like WFL to dump the results to. It usually is a gs bucket.
  • The pipeline enumeration implicitly identifies a data schema for the inputs to and outputs from the workload. You can think of it as the kind of workflow specified for the workload. People sometimes refer to this as the tag in that it is a well-known name for a Cromwell pipeline defined in WDL. You might also think of pipeline as the external or official name of a WFL processing module.
  • The project is just some text to identify a researcher, billing entity, or cost object responsible for the workload.
  • The common is something common for all of the samples, such as the workflow options. For more details, check the docs for the specific type of workload you are trying to submit.
  • The input URL prefix specifies the path you'd like WFL to read (a batch of) sample(s) from. It usually is a gs bucket.
  • The items is used to configure individual units of a workload. You can use it to tell WFL to treat arbitrary parts of the workload sepcially. For more details, check the docs for the specific type of workload you are trying to submit.