Utils

ChunkManifest

description
Chunk a manifest file into smaller files

Inputs

Required

  • manifest (File, required): The manifest file to chunk
  • manifest_lines_per_chunk (Int, required): The number of lines to include in each chunk

Optional

  • runtime_attr_override (RuntimeAttr?): Override the default runtime attributes

Outputs

  • manifest_chunks (Array[File])

SortSam

description
Sort a BAM file by coordinate order

Inputs

Required

  • input_bam (File, required): The BAM file to sort
  • prefix (String, required): The basename for the output BAM file

Optional

  • runtime_attr_override (RuntimeAttr?): Override the default runtime attributes

Outputs

  • output_bam (File)
  • output_bam_index (File)

MakeChrIntervalList

description
Make a Picard-style list of intervals for each chromosome in the reference genome

Inputs

Required

  • ref_dict (File, required): The reference dictionary

Optional

  • runtime_attr_override (RuntimeAttr?): Override the default runtime attributes

Defaults

  • filter (Array[String], default=['random', 'chrUn', 'decoy', 'alt', 'HLA', 'EBV']): A list of strings to filter out of the reference dictionary

Outputs

  • chrs (Array[Array[String]])
  • interval_list (File)
  • contig_interval_strings (Array[String])
  • contig_interval_list_files (Array[File])

ExtractIntervalNamesFromIntervalOrBamFile

description
Pulls the contig names and regions out of an interval list or bed file.

Inputs

Required

  • interval_file (File, required): Interval list or bed file from which to extract contig names and regions.

Optional

  • runtime_attr_override (RuntimeAttr?): Override the default runtime attributes

Outputs

  • interval_info (Array[Array[String]])

MakeIntervalListFromSequenceDictionary

description
Make a Picard-style list of intervals that covers the given reference genome dictionary, with intervals no larger than the given size limit.

Inputs

Required

  • ref_dict (File, required): The reference dictionary

Optional

  • runtime_attr_override (RuntimeAttr?): Override the default runtime attributes

Defaults

  • ignore_contigs (Array[String], default=['random', 'chrUn', 'decoy', 'alt', 'HLA', 'EBV']): A list of strings to filter out of the reference dictionary
  • max_interval_size (Int, default=10000)

Outputs

  • interval_list (File)
  • interval_info (Array[Array[String]])

CreateIntervalListFileFromIntervalInfo

description
Make a Picard-style interval list file from the given interval info.

Inputs

Required

  • contig (String, required): Contig for the interval.
  • end (String, required): End position for the interval.
  • start (String, required): Start position for the interval.

Optional

  • runtime_attr_override (RuntimeAttr?): Override the default runtime attributes

Outputs

  • interval_list (File)

CountBamRecords

description
Count the number of records in a bam file

Inputs

Required

  • bam (File, required); localization_optional: true; description: The bam file

Optional

  • runtime_attr_override (RuntimeAttr?): Override the default runtime attributes

Outputs

  • samools_error (File?)
  • num_records (Int)

DownsampleSam

description
Downsample the given bam / sam file using Picard/GATK's DownsampleSam tool.
author
Jonn Smith
email
jonn@broadinstitute.org

Inputs

Required

  • bam (File, required): BAM file to be filtered.

Optional

  • runtime_attr_override (RuntimeAttr?)

Defaults

  • extra_args (String, default=""): [Optional] Extra arguments to pass into DownsampleSam.
  • prefix (String, default="downsampled_reads"): [Optional] Prefix string to name the output file (Default: downsampled_reads).
  • probability (Float, default=0.01): [Optional] Probability that a read will be emitted (Default: 0.01).
  • random_seed (Int, default=1)
  • strategy (String, default="HighAccuracy"): [Optional] Strategy to use to downsample the given bam file (Default: HighAccuracy).

Outputs

  • output_bam (File)
  • output_bam_index (File)

Sum

description
Sum a list of integers.

Inputs

Required

  • ints (Array[Int], required): List of integers to be summed.

Optional

  • runtime_attr_override (RuntimeAttr?)

Defaults

  • prefix (String, default="sum"): [Optional] Prefix string to name the output file (Default: sum).

Outputs

  • sum (Int)
  • sum_file (File)

Uniq

description
Find the unique elements in a list of strings.

Inputs

Required

  • strings (Array[String], required): List of strings to be filtered.

Optional

  • runtime_attr_override (RuntimeAttr?)

Outputs

  • unique_strings (Array[String])

Timestamp

description
Get the current timestamp.

Inputs

Required

  • dummy_dependencies (Array[String], required): List of dummy dependencies to force recomputation.

Optional

  • runtime_attr_override (RuntimeAttr?): Override the default runtime attributes.

Outputs

  • timestamp (String)

ConvertReads

description
Convert reads from one format to another.

Inputs

Required

  • output_format (String, required): Output format.
  • reads (File, required): Reads to be converted.

Outputs

  • converted_reads (File)

BamToBed

description
Convert a BAM file to a bed file.

Inputs

Required

  • bam (File, required): BAM file to be converted.
  • prefix (String, required): Prefix for the output bed file.

Optional

  • runtime_attr_override (RuntimeAttr?): Override the default runtime attributes.

Outputs

  • bed (File)

BamToFastq

description
Convert a BAM file to a fastq file.

Inputs

Required

  • bam (File, required): BAM file to be converted.
  • prefix (String, required): Prefix for the output fastq file.

Optional

  • runtime_attr_override (RuntimeAttr?): Override the default runtime attributes.

Outputs

  • reads_fq (File)

MergeFastqs

description
Merge fastq files.

Inputs

Required

  • fastqs (Array[File], required): Fastq files to be merged.

Optional

  • runtime_attr_override (RuntimeAttr?): Override the default runtime attributes.

Defaults

  • prefix (String, default="merged"): Prefix for the output fastq file.

Outputs

  • merged_fastq (File)

MergeBams

description
Merge several input BAMs into a single BAM.

Inputs

Required

  • bams (Array[File], required): Input array of BAMs to be merged.

Optional

  • runtime_attr_override (RuntimeAttr?): Override the default runtime attributes.

Defaults

  • prefix (String, default="out"): Prefix for the output BAM.

Outputs

  • merged_bam (File)
  • merged_bai (File)

Index

description
samtools index a BAM file.

Inputs

Required

  • bam (File, required): BAM file to be indexed.

Optional

  • runtime_attr_override (RuntimeAttr?): Override the default runtime attributes.

Outputs

  • bai (File)

SubsetBam

description
Subset a BAM file to a specified locus.

Inputs

Required

  • bai (File, required): index for bam file
  • bam (File, required); description: bam to subset; localization_optional: true
  • locus (String, required): genomic locus to select

Optional

  • runtime_attr_override (RuntimeAttr?): Override the default runtime attributes.

Defaults

  • prefix (String, default="subset"): prefix for output bam and bai file names

Outputs

  • subset_bam (File)
  • subset_bai (File)

ResilientSubsetBam

description
For subsetting a high-coverage BAM stored in GCS, without localizing (more resilient to auth. expiration).

Inputs

Required

  • bai (File, required)
  • bam (File, required); localization_optional: true
  • interval_id (String, required): an ID string for representing the intervals in the interval list file
  • interval_list_file (File, required): a Picard-style interval list file to subset reads with
  • prefix (String, required): prefix for output bam and bai file names

Optional

  • runtime_attr_override (RuntimeAttr?)

Outputs

  • subset_bam (File)
  • subset_bai (File)

Bamtools

description
Runs a given bamtools command on a bam file

Inputs

Required

  • args (String, required): arguments to pass to bamtools
  • bamfile (File, required): bam file to run bamtools on
  • cmd (String, required): bamtools command to run

Optional

  • runtime_attr_override (RuntimeAttr?)

Defaults

  • prefix (String, default="out")

Outputs

  • bam (File)

DeduplicateBam

description
Utility to drop (occationally happening) duplicate records in input BAM

Inputs

Required

  • aligned_bai (File, required): input BAM index file
  • aligned_bam (File, required): input BAM file

Optional

  • runtime_attr_override (RuntimeAttr?): override default runtime attributes

Defaults

  • same_name_as_input (Boolean, default=true): if true, output BAM will have the same name as input BAM, otherwise it will have the input basename with .dedup suffix

Outputs

  • corrected_bam (File)
  • corrected_bai (File)

Cat

description
Utility to concatenates a group of files into a single output file, with headers in the first line if has_header is true. If has_header is false, the script concatenates the files without headers.

Inputs

Required

  • files (Array[File], required): text files to combine

Optional

  • runtime_attr_override (RuntimeAttr?)

Defaults

  • has_header (Boolean, default=false): files have a redundant header
  • out (String, default="out.txt"): [default-valued] output filename

Outputs

  • combined (File)

ComputeGenomeLength

description
Utility to compute the length of a genome from a FASTA file

Inputs

Required

  • fasta (File, required): FASTA file

Optional

  • runtime_attr_override (RuntimeAttr?)

Outputs

  • length (Float)

ListFilesOfType

description
Utility to list files of a given type in a directory

Inputs

Required

  • gcs_dir (String, required): input directory
  • suffixes (Array[String], required): suffix(es) for files

Optional

  • runtime_attr_override (RuntimeAttr?)

Defaults

  • recurse (Boolean, default=false): if true, recurse through subdirectories

Outputs

  • files (Array[String])
  • manifest (File)

StopWorkflow

description
Utility to stop a workflow

Inputs

Required

  • reason (String, required): reason for stopping

Outputs

None

InferSampleName

description
Infer sample name encoded on the @RG line of the header section. Fails if multiple values found, or if SM ~= unnamedsample.

Inputs

Required

  • bai (File, required)
  • bam (File, required); localization_optional: true; description: BAM file

Outputs

  • sample_name (String)

CheckOnSamplenames

description
Makes sure the provided sample names are all same, i.e. no mixture of sample names

Inputs

Required

  • sample_names (Array[String], required): sample names

Outputs

None

ComputeAllowedLocalSSD

description
Compute the number of LOCAL ssd's allowed by Google

Inputs

Required

  • intended_gb (Int, required): intended number of GB

Outputs

  • numb_of_local_ssd (Int)

RandomZoneSpewer

description
Spews a random GCP zone

Inputs

Required

  • num_of_zones (Int, required): number of zones to spew

Outputs

  • zones (String)

GetCurrentTimestampString

volatile
true
description
Get the current timestamp as a string

Inputs

Defaults

  • date_format (String, default="%Y%m%d_%H%M%S_%N"): The date format string to use. See the unix date command for more info.

Outputs

  • timestamp_string (String)

GetRawReadGroup

description
Get the raw read group from a bam file (assumed to have 1 read group only)

Inputs

Required

  • gcs_bam_path (String, required): path to bam file in GCS

Optional

  • runtime_attr_override (RuntimeAttr?): override the runtime attributes

Outputs

  • rg (String)

GetReadsInBedFileRegions

desciption
Get the reads from the given bam path which overlap the regions in the given bed file.

Inputs

Required

  • gcs_bam_path (String, required): GCS URL to bam file from which to extract reads.
  • regions_bed (File, required): Bed file containing regions for which to extract reads.

Optional

  • runtime_attr_override (RuntimeAttr?): Runtime attributes override struct.

Defaults

  • prefix (String, default="reads"): [default-valued] prefix for output BAM

Outputs

  • bam (File)
  • bai (File)

MapToTsv

description
Convert a map to a tsv file

Inputs

Required

  • my_map (Map[String,Float], required): The map to convert
  • name_of_file (String, required): The name of the file to write to

Outputs

  • result (File)

CreateIGVSession

description
Create an IGV session given a list of IGV compatible file paths. Adapted / borrowed from https://github.com/broadinstitute/palantir-workflows/blob/mg_benchmark_compare/BenchmarkVCFs .

Inputs

Required

  • input_bams (Array[String], required)
  • input_vcfs (Array[String], required)
  • output_name (String, required)
  • reference_short_name (String, required)

Optional

  • runtime_attr_override (RuntimeAttr?)

Outputs

  • igv_session (File)

SplitContigToIntervals

author
Jonn Smith
notes
Splits the given contig into intervals of the given size.

Inputs

Required

  • contig (String, required)
  • prefix (String, required)
  • ref_dict (File, required)
  • ref_fasta (File, required)
  • ref_fasta_fai (File, required)

Optional

  • runtime_attr_override (RuntimeAttr?)

Defaults

  • size (Int, default=200000)

Outputs

  • full_bed_file (File)
  • individual_bed_files (Array[File])