Skip to content

GDCWholeGenomeSomaticSingleSample⚓︎

Inputs⚓︎

In addition to the standard workload request inputs:

  • executor : URL of the Cromwell service
  • output : GCS URL prefix for output files
  • pipeline : literally "GDCWholeGenomeSomaticSingleSample"
  • project : some tracking label you can choose

a GDCWholeGenomeSomaticSingleSample workload requires the following inputs for each workflow.

  • base_file_name
  • contamination_vcf_index
  • contamination_vcf
  • cram_ref_fasta_index
  • cram_ref_fasta
  • dbsnp_vcf_index
  • dbsnp_vcf
  • input_cram

Here is what those are.

base_file_name⚓︎

The leaf name of a sample input or output path without the . suffix. The base_file_name is usually the same as the sample name and differs in every workflow.

contamination_vcf_index and contamination_vcf⚓︎

These are GCS pathnames of the contamination detection data for the input samples. This commonly depends on the reference genome for the samples, and is shared across all the workflows.

cram_ref_fasta_index and cram_ref_fasta⚓︎

These are GCS pathnames of the reference FASTA to which the input CRAM is aligned. This FASTA is used to expand CRAMs to BAMs and again is generally shared across all the workflows.

dbsnp_vcf_index and dbsnp_vcf⚓︎

These are GCS pathnames of a VCF containing a database of known variants from the reference. As with the contamination and reference FASTA files, typically these are shared across all the workflows.

input_cram⚓︎

This is a GCS pathname to the input CRAM. It's last component will typically be the base_file_name value with ".cram" appended. The GDCWholeGenomeSomaticSingleSample.wdl workflow definition expects to find a base_file_name.cram.crai file for every base_file_name.cram file specified as an input_cram.

Usage⚓︎

GDCWholeGenomeSomaticSingleSample workload supports the following API endpoints:

Verb Endpoint Description
GET /api/v1/workload List all workloads, optionally filtering by uuid or project
GET /api/v1/workload/{uuid}/workflows List all workflows for a specified workload uuid
POST /api/v1/create Create a new workload
POST /api/v1/start Start a workload
POST /api/v1/stop Stop a running workload
POST /api/v1/exec Create and start (execute) a workload
Permissions in production

External Whole Genome Reprocessing in gotc-prod uses a set of execution projects, please refer to this page when you have questions about permissions.

Create Workload: /api/v1/create⚓︎

Create a WFL workload running in production.

curl --location --request POST \
https://gotc-prod-wfl.gotc-prod.broadinstitute.org/api/v1/create \
--header "Authorization: Bearer $(gcloud auth print-access-token)" \
--header 'Content-Type: application/json' \
--data-raw '
{
  "executor": "https://cromwell-gotc-auth.gotc-prod.broadinstitute.org",
  "output": "gs://broad-prod-somatic-genomes-output",
  "pipeline": "GDCWholeGenomeSomaticSingleSample",
  "project": "PO-1234",
  "items": [
    {
      "inputs": {
        "base_file_name": "27B-6",
        "contamination_vcf": "gs://gatk-best-practices/somatic-hg38/small_exac_common_3.hg38.vcf.gz",
        "contamination_vcf_index": "gs://gatk-best-practices/somatic-hg38/small_exac_common_3.hg38.vcf.gz.tbi",
        "cram_ref_fasta": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta",
        "cram_ref_fasta_index": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai",
        "dbsnp_vcf": "gs://gcp-public-data--broad-references/hg38/v0/gdc/dbsnp_144.hg38.vcf.gz",
        "dbsnp_vcf_index": "gs://gcp-public-data--broad-references/hg38/v0/gdc/dbsnp_144.hg38.vcf.gz.tbi",
        "input_cram": "gs://broad-gotc-prod-storage/pipeline/PO-1234/27B-6/v1/27B-6.cram"
      },
      "options": {
        "monitoring_script": "gs://broad-gotc-prod-storage/scripts/monitoring_script.sh"
      }
    }
  ]
}'
{
  "commit": "477bb195c40cc5f5afb81ca1b57e97c9cc18fa2c",
  "created": "2021-04-05T16:02:31Z",
  "creator": "tbl@broadinstitute.org",
  "executor": "https://cromwell-gotc-auth.gotc-prod.broadinstitute.org",
  "output": "gs://broad-prod-somatic-genomes-output",
  "pipeline": "GDCWholeGenomeSomaticSingleSample",
  "project": "PO-1234",
  "release": "GDCWholeGenomeSomaticSingleSample_v1.1.0",
  "started": "2021-04-05T16:02:32Z",
  "uuid": "efb00901-378e-4365-86e7-edd0fbdaaab2",
  "version": "0.7.0",
  "wdl": "pipelines/broad/dna_seq/somatic/single_sample/wgs/gdc_genome/GDCWholeGenomeSomaticSingleSample.wdl"
}

Note that the GDCWholeGenomeSomaticSingleSample pipeline supports Cromwell workflowOptions via the options map. See the reference page for more information.

Start Workload: /api/v1/start⚓︎

Start all the workflows in the workload.

curl --location --request POST \
https://gotc-prod-wfl.gotc-prod.broadinstitute.org/api/v1/start \
--header "Authorization: Bearer $(gcloud auth print-access-token)" \
--header 'Content-Type: application/json' \
--data-raw '{"uuid": "efb00901-378e-4365-86e7-edd0fbdaaab2"}'
{
  "commit": "477bb195c40cc5f5afb81ca1b57e97c9cc18fa2c",
  "created": "2021-04-05T16:02:31Z",
  "creator": "tbl@broadinstitute.org",
  "executor": "https://cromwell-gotc-auth.gotc-prod.broadinstitute.org",
  "output": "gs://broad-prod-somatic-genomes-output",
  "pipeline": "GDCWholeGenomeSomaticSingleSample",
  "project": "PO-1234",
  "release": "GDCWholeGenomeSomaticSingleSample_v1.1.0",
  "started": "2021-04-05T16:02:32Z",
  "uuid": "efb00901-378e-4365-86e7-edd0fbdaaab2",
  "version": "0.7.0",
  "wdl": "pipelines/broad/dna_seq/somatic/single_sample/wgs/gdc_genome/GDCWholeGenomeSomaticSingleSample.wdl"
}

Start Workload: /api/v1/start⚓︎

Included for compatibility with continuous workloads.

curl -X POST 'https://gotc-prod-wfl.gotc-prod.broadinstitute.org/api/v1/stop' \
     -X "Authorization: Bearer $(gcloud auth print-access-token)" \
     -X 'Content-Type: application/json' \
     -d '{ "uuid": "efb00901-378e-4365-86e7-edd0fbdaaab2" }'
{
  "commit": "477bb195c40cc5f5afb81ca1b57e97c9cc18fa2c",
  "created": "2021-04-05T16:02:31Z",
  "creator": "tbl@broadinstitute.org",
  "executor": "https://cromwell-gotc-auth.gotc-prod.broadinstitute.org",
  "output": "gs://broad-prod-somatic-genomes-output",
  "pipeline": "GDCWholeGenomeSomaticSingleSample",
  "project": "PO-1234",
  "release": "GDCWholeGenomeSomaticSingleSample_v1.1.0",
  "started": "2021-04-05T16:02:32Z",
  "stopped": "2021-04-05T16:02:33Z",
  "uuid": "efb00901-378e-4365-86e7-edd0fbdaaab2",
  "version": "0.7.0",
  "wdl": "pipelines/broad/dna_seq/somatic/single_sample/wgs/gdc_genome/GDCWholeGenomeSomaticSingleSample.wdl"
}

Exec Workload: /api/v1/exec⚓︎

Create a workload, then start every workflow in the workload.

Except for the different WFL URI, the request and response are the same as for Create Workload above.

curl --location --request POST \
https://gotc-prod-wfl.gotc-prod.broadinstitute.org/api/v1/exec \
... and so on ...

Query Workload: /api/v1/workload?uuid=<uuid>⚓︎

Query WFL for a workload by its UUID.

curl --location --request GET \
https://gotc-prod-wfl.gotc-prod.broadinstitute.org/api/v1/workload?uuid=efb00901-378e-4365-86e7-edd0fbdaaab2 \
--header 'Authorization: Bearer '$(gcloud auth print-access-token)

A successful response from /api/v1/workload is always an array of workload objects, but specifying a UUID returns only one.

[
 {
   "commit": "477bb195c40cc5f5afb81ca1b57e97c9cc18fa2c",
   "created": "2021-04-05T16:02:31Z",
   "creator": "tbl@broadinstitute.org",
   "executor": "https://cromwell-gotc-auth.gotc-prod.broadinstitute.org",
   "output": "gs://broad-prod-somatic-genomes-output",
   "pipeline": "GDCWholeGenomeSomaticSingleSample",
   "project": "PO-1234",
   "release": "GDCWholeGenomeSomaticSingleSample_v1.1.0",
   "started": "2021-04-05T16:02:32Z",
   "uuid": "efb00901-378e-4365-86e7-edd0fbdaaab2",
   "version": "0.7.0",
   "wdl": "pipelines/broad/dna_seq/somatic/single_sample/wgs/gdc_genome/GDCWholeGenomeSomaticSingleSample.wdl"
 }
]

Query Workload with project: /api/v1/workload?project=<project>⚓︎

Query WFL for all workloads with a specified project label.

curl --location --request GET \
/api/v1/workload?project=wgs-dev \
https://gotc-prod-wfl.gotc-prod.broadinstitute.org/api/v1/workload?project=PO-1234 \
--header 'Authorization: Bearer '$(gcloud auth print-access-token)

The response is the same as when specifying a UUID, except the array may contain multiple workload objects that share the same project value.

Note

A request to the /api/v1/workload endpoint without a project or uuid parameter returns all of the workloads that WFL knows about. That response might be large and take a while to process.

List workflows managed by the workload GET /api/v1/workload/{uuid}/workflows⚓︎

curl -X GET '/api/v1/workload/efb00901-378e-4365-86e7-edd0fbdaaab2/workflows' \
     -H 'Authorization: Bearer '$(gcloud auth print-access-token)

A successful response from /api/v1/workload/{uuid}/workload is always an array of Cromwell workflows with their statuses.

[{
      "status": "Submitted",
      "updated": "2021-04-05T16:02:32Z",
      "uuid": "8c1f586e-036b-4690-87c2-2af5d7e00450",
      "inputs": {
          "base_file_name": "27B-6",
          "contamination_vcf": "gs://gatk-best-practices/somatic-hg38/small_exac_common_3.hg38.vcf.gz",
          "contamination_vcf_index": "gs://gatk-best-practices/somatic-hg38/small_exac_common_3.hg38.vcf.gz.tbi",
          "cram_ref_fasta": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta",
          "cram_ref_fasta_index": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai",
          "dbsnp_vcf": "gs://gcp-public-data--broad-references/hg38/v0/gdc/dbsnp_144.hg38.vcf.gz",
          "dbsnp_vcf_index": "gs://gcp-public-data--broad-references/hg38/v0/gdc/dbsnp_144.hg38.vcf.gz.tbi",
          "input_cram": "gs://broad-gotc-prod-storage/pipeline/PO-1234/27B-6/v1/27B-6.cram"
      },
      "options": {
          "monitoring_script": "gs://broad-gotc-prod-storage/scripts/monitoring_script.sh"
      }
}]