Arrays module⚓︎
WorkFlow Launcher (WFL) implements aou-arrays
module to
support secure and efficient processing of the AllOfUs Arrays
samples. This page documents the design principles and assumptions
of the module as well as summarizes the general process of module
development.
aou-arrays
module implements arrays workload as a continuous workload, which
means all samples are coming in like a continuous stream, and WFL does not make
any assumption of how many samples will be in the workload or how to group the
samples together: it hands off the workload creation and starting process to its
caller.
API⚓︎
aou-arrays
module, like others, implements the following multimethod dispatchers:
- start-workload!
- add-workload!
It supports the following API endpoints:
Verb | Endpoint | Description |
---|---|---|
GET | /api/v1/workload |
List all workloads, optionally filtering by uuid or project |
GET | /api/v1/workload/{uuid}/workflows |
List all workflows for a specified workload uuid |
POST | /api/v1/create |
Create a new workload |
POST | /api/v1/start |
Start a workload |
POST | /api/v1/stop |
Stop a running workload |
POST | /api/v1/exec |
Create and start (execute) a workload |
POST | /api/v1/append_to_aou |
Append one or more sample(s) to an existing AOU workload, unless stopped |
Different from the fixed workload types that caller only needs to create a workload with a series of sample inputs and
then simply start the workload, aou-arrays
module requires the caller to manage the life cycle of a workload on their
own in a multi-stage manner:
- The caller needs to create a workload and specify the type to be
AllOfUsArrays
, the caller will receive the information of the created workload if everything goes well, one of which is theuuid
of the workload. - Once the workload information is being reviewed, the caller needs to "start" the newly created workload to tell WFL that "this workload is ready to accept incoming samples". Without this "start" signal WFL will refuse to append any sample to this workload.
- The caller can append new individual samples to an existing started workload, and these new samples will be analyzed, processed and submitted to Cromwell as long as it has valid information.
To give more information, here are some example inputs to the above endpoints:
GET /api/v1/workload
curl 'http://localhost:8080/api/v1/workload' \
-H 'Authorization: Bearer '$(gcloud auth print-access-token) \
-H 'Accept: application/json'
GET /api/v1/workload?uuid={uuid}
curl 'http://localhost:8080/api/v1/workload?uuid=00000000-0000-0000-0000-000000000000' \
-H 'Authorization: Bearer '$(gcloud auth print-access-token) \
-H 'Accept: application/json'
GET /api/v1/workload?project={project}
curl 'http://localhost:8080/api/v1/workload?project=(Test)%20WFL%20Local%20Testing' \
-H 'Authorization: Bearer '$(gcloud auth print-access-token) \
-H 'Accept: application/json'
Note
project
and uuid
are optional path parameters to the /api/v1/workload
endpoint,
hitting this endpoint without them will return all workloads. However, they cannot be specified
together.
GET /api/v1/workload/{uuid}/workflows
curl 'http://localhost:8080/api/v1/workload/00000000-0000-0000-0000-000000000000/workflows' \
-H 'Authorization: Bearer '$(gcloud auth print-access-token)
POST /api/v1/create
curl -X POST 'http://localhost:8080/api/v1/create' \
-H 'Authorization: Bearer '$(gcloud auth print-access-token) \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"executor": "https://cromwell-gotc-auth.gotc-dev.broadinstitute.org",
"output": "gs://broad-gotc-dev-wfl-ptc-test-outputs/aou-test-output/",
"project": "Example Project",
"pipeline": "AllOfUsArrays"
}'
POST /api/v1/start
curl -X POST 'http://localhost:8080/api/v1/start' \
-H 'Authorization: Bearer '$(gcloud auth print-access-token) \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
-d $'{ "uuid": "00000000-0000-0000-0000-000000000000" }'
POST /api/v1/stop
Stops the workload from accepting new samples.
See also: /api/v1/append_to_aou
.
curl -X POST 'http://localhost:8080/api/v1/stop' \
-H 'Authorization: Bearer '$(gcloud auth print-access-token) \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
-d $'{ "uuid": "00000000-0000-0000-0000-000000000000" }'
POST /api/v1/workload/append_to_aou
curl -X POST 'http://localhost:8080/api/v1/append_to_aou' \
-H 'Authorization: Bearer '$(gcloud auth print-access-token) \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
-d $'{
"uuid": "00000000-0000-0000-0000-000000000000",
"notifications": [
{
"zcall_thresholds_file": "foo",
"sample_lsid": "foo",
"bead_pool_manifest_file": "foo",
"chip_well_barcode": "foo",
"sample_alias": "foo",
"green_idat_cloud_path": "foo",
"red_idat_cloud_path": "foo",
"cluster_file": "foo",
"reported_gender": "foo",
"gender_cluster_file": "foo",
"params_file": "foo",
"extended_chip_manifest_file": "foo",
"analysis_version_number": 1
},
{
"zcall_thresholds_file": "foo",
"sample_lsid": "foo",
"bead_pool_manifest_file": "foo",
"chip_well_barcode": "foo",
"sample_alias": "foo",
"green_idat_cloud_path": "foo",
"red_idat_cloud_path": "foo",
"cluster_file": "foo",
"reported_gender": "foo",
"gender_cluster_file": "foo",
"params_file": "foo",
"extended_chip_manifest_file": "foo",
"analysis_version_number": 5
}
]
}'
Workload Model⚓︎
WFL designed the following workload model in order to support the above API and workload submission mechanism.
Initially, it has the following schema:
List of relations
Schema | Name | Type | Owner | Size | Description
--------+--------------------------------+----------+----------+------------+-------------
public | databasechangelog | table | foo | 16 kB |
public | databasechangeloglock | table | foo | 8192 bytes |
public | workload | table | foo | 16 kB |
public | workload_id_seq | sequence | foo | 8192 bytes |
the workload
table looks like:
id | commit | created | creator | cromwell | finished | input | items | output | pipeline | project | release | started | uuid | version | wdl
----+--------+---------+---------+----------+----------+-------+-------+--------+----------+---------+---------+---------+------+---------+-----
(0 rows)
Note that different from the fixed workload types, input
, output
and items
are not useful to aou-arrays
workload
since these fields vary from sample to sample. Any information the caller provided to these fields will stored as
placeholders.
More importantly, even though id
is the primary
key here, (pipeline, project, release)
works as the
unique identifier for arrays workloads, for instance, if there's already a workload with values:
(AllOfUsArrays, gotc-dev, Arrays_v1.9)
, any further attempts to create a new workload with exact the same values
will return the information of this existing workload rather than create a new row.
Once the caller successfully creates a new sample, there will be a new row added to the above workload
table, and a
new table will be created accordingly:
List of relations
Schema | Name | Type | Owner | Size | Description
--------+--------------------------------+----------+----------+------------+-------------
public | allofusarrays_000000001 | table | foo | 16 kB |
public | allofusarrays_000000001_id_seq | sequence | foo | 8192 bytes |
public | databasechangelog | table | foo | 16 kB |
public | databasechangeloglock | table | foo | 8192 bytes |
public | workload | table | foo | 16 kB |
public | workload_id_seq | sequence | foo | 8192 bytes |
The allofusarrays_00000000X
table has the following fields:
id | analysis_version_number | chip_well_barcode | status | updated | uuid
----+-------------------------+-------------------+-----------+-------------------------------+--------------------------------------
1 | 1 | 0000000000_R01C01 | Succeeded | 2020-07-21 00:00:00.241218-04 | 00000000-0000-0000-0000-000000000000
2 | 5 | 0000000000_R01C01 | Failed | 2020-07-21 00:00:00.028976-04 | 00000000-0000-0000-0000-000000000000
(analysis_version_number, chip_well_barcode)
works as the primary key
, any new samples that collide
with the existing primary-keys
will be omitted.