Welcome to WorkFlow Launcher⚓︎

Overview⚓︎

WorkFlow Launcher (WFL) is a workload manager.

For example, a workload could be a set of Whole Genome samples to be reprocessed in a given project/bucket, the workflow is the processing of an individual sample in that workload running WGS reprocessing; a workload could also be a queue of incoming notifications that describe all of the required inputs to launch Arrays scientific pipelines in Cromwell.

Most recent efforts leverage the general applicability of a staged workload model which automates fetching data from a source, pushing it into a workflow executor for analysis, and delivering the results of the analysis to an output location (also known as a sink).

WFL is designed to be deployed to run as a service in the cloud, primarily on Kubernetes clusters.

For more on Workflow Launcher's role in the Terra infrastructure see Workflow Launcher's role in Terra.

Quickstart⚓︎

Tip

This is the Quickstart section, which should cover the most frequent uses cases that interact with WFL. For more detailed information, please check other sections such as the development guide or modules design principles.

Build⚓︎

The easiest way to build WFL is via make, in addition, the following prerequisites are needed:

The Docker daemon
Clojure (brew install clojure on macOS)
Python3 (brew install python@3.9 on macOS)
NodeJS (brew install node on macOS)
Google Cloud SDK (brew install --cask google-cloud-sdk on macOS)

Arch Linux tips

Install clojure from the official repository.
Install google-cloud-sdk from the AUR.

You could then invoke make at the project level to test and build all workflow-launcher modules:

bash $ make -j8 where 8 can be replaced by any number that represents the concurrent jobs you wish to run.

Info

If the version of your make is above GNU Make 4.0 (you could check by running make --version), it's highly recommended to use --output-sync along with -j so the standard outputs are sorted, i.e. $ make -j8 --output-sync

make will build each module in workflow-launcher, run tests and generate Docker images. All generated files go into a derived directory under the project root.

You can also invoke make on a module from the top level directory by

bash $ make [MODULE] TARGET={prebuild|build|check|images|clean|distclean}

where currently available MODULEs are {api functions/aou docs helm ui}

For most of the time, you would want to run something like:

bash $ make clean

to clean up the built modules (-j8 is also available for make clean).

and then run:

bash $ make ui api TARGET=images -j8

to only build the WFL and its docker images without running tests.

Info

Note if you updated the second party repositories such as pipeline-config or gotc-deploy, you might have to run: bash $ make distclean to remove them. This is not always needed but can help completely purge the local derived files.

Test⚓︎

If you only want to run tests on specific modules, you could run:

bash $ make [MODULE] TARGET=check

such as make api TARGET=check or make functions/aou TARGET=check. Note this automatically makes all of check's prerequisites.

Clojure Test⚓︎

When it comes to clojure tests, sometimes it's useful to only run a subset of tests to save time and filter out noise. You can do this by directly invoke clojure cli from within the api directory, for example, cd api and:

bash $ clojure -M:test integration --focus wfl.integration.modules.copyfile-test

In general, we implement Clojure tests under the test/ root directory and use the kaocha test runner. Test suites use a -test namespace suffix. You can pass extra command line arguments to kaocha, such as the above --focus flag. You can see the full list of options with the following:

shell clojure -M:test --help

At present, wfl api has three kinds of test, unit, integration, and system. These can be run via the deps.edn, optionally specifying the kind:

shell clojure -M:test [unit|integration|system]

Note that the integration tests currently require a little more configuration before they can be run, namely, they require a wfl server running locally:

shell ./ops/server.sh

Additionally, there is a custom parallel test runner that can be invoked to help speed up the system tests. Rather than clojure -M:test system you'd just specify the namespace(s) to try to parallelize.

shell clojure -M:parallel-test wfl.system.v1-endpoint-test

Info

Note for system tests, no matter it's kicked off through clojure -M:test system or clojure -M:parallel-test wfl.system.v1-endpoint-test, you can use an environment variable WFL_CROMWELL_URL to override the default Cromwell instance that's used in the test. For example:

bash WFL_CROMWELL_URL=https://cromwell-gotc-auth.gotc-prod.broadinstitute.org/ clojure -M:parallel-test wfl.system.v1-endpoint-test

will tell the test to submit workflows to the "gotc-prod" Cromwell instance no matter what the default instance was defined in the test. However, you need to make sure the validity of the Cromwell URL you passed in; certain IAM permissions will also be required in order for Cromwell to execute the testing workflows smoothly.

Deploy⚓︎

Currently, we mainly deploy WFL to broad-gotc-dev and broad-gotc-prod projects. When it's time to deploy WFL, for most of the time developers need to release a new version following the steps in Release Guide

After which, the developers who have broad VPN connected can go to the Jenkins Page to deploy applicable versions of WFL to various available cloud projects.

Implementation⚓︎

Top-level files⚓︎

After cloning a new WFL repo, the top-level files are: . ├── api/ - `workflow-launcher` backend ├── functions/ - cloud functions deployed separately ├── database/ - database scheme migration changelog and changeset ├── derived/ - generated artifacts ├── docs/ - ancillary documentation ├── helm/ - helm-managed k8s configuration ├── LICENSE.txt ├── Makefile - project level` Makefile` ├── makerules/ - common `Makefile` functionality ├── ops/ - scripts to support Operations ├── README.md - symbolic link to docs/md/README.md └── version - holds the current semantic version Tip: Run make at least once after cloning the repo to make sure all the necessary files are in place.

`api` Module⚓︎

Source code⚓︎

The Clojure source code is in the api/src/ directory.

The entry point for the WFL executable is the -main function in main.clj. It takes the command line arguments as strings, validates the arguments, then launches the appropriate process.

The server.clj file implements the WFL server. The server_debug.clj file adds some tools to aid in debugging the server.

Some hacks specific to WFL are in wfl.clj.

The build.clj file includes build and deployment code.

The debug.clj file defines some macros useful when debugging or logging.

The util.clj file contains a few functions and macros used in WFL that are not specific to its function.

The environments.clj file defines configuration parameters for different execution contexts. It's a placeholder in this repo but will be loaded in build/deploy time from a private repo.

The module/xx.clj file implements a command-line starter for reprocessing eXternal eXomes.

The module/wgs.clj file helps implements a command-line starter for reprocessing Whole GenomeS.

The module/sg.clj file implements Somatic Genomes support.

The module/all.clj file hosts some utilities shared across modules.

The metadata.clj file implements a tool to extract metadata from Cromwell that can be archived with the outputs generated by a workflow.

The dx.clj file implements miscellaneous pipeline debugging tools.

The once.clj file defines some initialization functions mostly supporting authentication.

The api/handlers.clj file defines the handler functions used by server.

The api/routes.clj file defines the routing strategy for server.

Each of the other source files implement an interface to one of the services WFL talks to, and are named accordingly.

File	Service
cromwell.clj	Cromwell workflow runner
datarepo.clj	DSP DataRepo
db.clj	On-prem and Cloud SQL databases
gcs.clj	Google Cloud Storage
jms.clj	Java Message Service queues
postgres.clj	Cloud SQL postgres databases
server.clj	the WFL server itself

Exomes in the Cloud Resources⚓︎

From Hybrid Selection in the Cloud V1

Clients
- Google Cloud Storage Client Library (Java)
- Google Cloud Client Library for Java
Diagrams
- Zamboni Overview
Sources
- /Users/tbl/Broad/zamboni/Client/src/scala/org/broadinstitute/zamboni/client/lightning/clp/Lightning.scala
- /Users/tbl/Broad/picard-private/src/java/edu/mit/broad/picard/lightning
- /Users/tbl/Broad/gppipeline-devtools/release_client
- /Users/tbl/Broad/gppipeline-devtools/starter_control
- /picard02:/seq/pipeline/gppipeline-devtools/current/defs/prod.defs

Welcome to WorkFlow Launcher⚓︎

Overview⚓︎

Quickstart⚓︎

Build⚓︎

Test⚓︎

Clojure Test⚓︎

Deploy⚓︎

Implementation⚓︎

Top-level files⚓︎

api Module⚓︎

Source code⚓︎

Exomes in the Cloud Resources⚓︎

`api` Module⚓︎