DownloadFromFTP

DownloadFromFTP

description
Download files from FTP in parallel and store the results in the specified GCS dir. This pipeline is essentially a Cromwell/GCP reimagining of the Nextflow/AWS downloading pipeline from @alaincoletta (see: http://broad.io/aws_dl).

Inputs

Required

  • ftp_dirs (Array[String], required): The FTP directories to download
  • gcs_out_root_dir (String, required)

Optional

  • ComputeDiskSize.runtime_attr_override (RuntimeAttr?)
  • DownloadFTPFile.runtime_attr_override (RuntimeAttr?)
  • GetFileManifest.runtime_attr_override (RuntimeAttr?)

Defaults

  • exclude (Array[String], default=[]): [default-valued] Simple substring patterns to exclude from the download.
  • num_simultaneous_downloads (Int, default=10): [default-valued] The number of files to fetch simultaneously.
  • prepend_dir_name (Boolean, default=true): If true, place the files in a subdirectory based on the basename of the FTP dir.

Outputs

  • GetFileManifest.manifest (File)
  • DownloadFTPFile.out (Array[String?])
  • ComputeDiskSize.max_size_bytes (Array[Float])

Dot Diagram

DownloadFromFTP