DownloadFromWeb

DownloadFromWeb

description
This WDL pipeline downloads directories from HTTP/FTP/SFTP servers in parallel and stores the results in the specified GCS dir. This pipeline is essentially a Cromwell/GCP reimagining of the Nextflow/AWS downloading pipeline from @alaincoletta (see: http://broad.io/aws_dl).

Inputs

Required

  • gcs_out_root_dir (String, required): GCS bucket to store the reads, variants, and metrics files
  • manifest (File, required): A file with a list of SRA ID(s) to download on each line

Optional

  • DownloadFiles.runtime_attr_override (RuntimeAttr?)

Defaults

  • num_simultaneous_downloads (Int, default=10): [default-valued] The number of files to fetch simultaneously.
  • prepend_dir_name (Boolean, default=true): If true, place the files in a subdirectory based on the basename of the FTP dir.
  • DownloadFiles.disk_size_gb (Int, default=100)
  • DownloadFiles.num_cpus (Int, default=4)

Outputs

  • DownloadFiles.out (Array[String])

Dot Diagram

DownloadFromWeb