TrioBinChildLongReads

TrioBinChildLongReads

description
A workflow that performs trio-binning of child long reads given parental (short) reads. Based on the trio-canu publication:
De novo assembly of haplotype-resolved genomes with trio binning https://www.nature.com/articles/nbt.4277

We divide the workflow into two parts:
- part one: collect k-mer stats given parental (short) reads
- part two: given the k-mer stats database from part one, classify child long reads

Inputs

Required

  • child_long_reads_bucket (String, required): GCS bucket path holding FASTA/FASTQ of child long reads
  • father_short_reads_bucket (String, required): GCS bucket path holding FASTA/FASTQ of (short) reads of paternal origin
  • genome_size (String, required): an esimate on genome size of the specicies (affects k-value picking)
  • long_read_platform (String, required): platform of long read sequencing; currently only one of [pacbio-raw, nanopore-raw] is supported
  • mother_short_reads_bucket (String, required): GCS bucket path holding FASTA/FASTQ of (short) reads of maternal origin
  • vm_local_monitoring_script (File, required): GCS file holding a resouce monitoring script that runs locally and collects info for a very specific purpose
  • workdir_name (String, required): name of working directory

Optional

  • kmerSize (Int?): [optional] force specifying k-value in collecting k-mer stats on parents
  • run_with_debug (Boolean?): [optional] whether to run in debug mode (takes significantly more disk space and more logs); defaults to false
  • AssignChildLongReads.runtime_attr_override (RuntimeAttr?)
  • CollectParentsKmerStats.MerylCount.runtime_attr_override (RuntimeAttr?)
  • CollectParentsKmerStats.MerylMergeAndSubtract.runtime_attr_override (RuntimeAttr?)
  • CollectParentsKmerStats.ParentalReadsRepartitionAndMerylConfigure.runtime_attr_override (RuntimeAttr?)

Defaults

  • child_read_assign_memoryG_est (Int, default=32): [default-valued] estimate on how many GB memory to allocate to the child longread classification step
  • child_read_assign_threads_est (Int, default=36): [default-valued] estimate on how many threads to allocate to the child longread classification step
  • meryl_operations_threads_est (Int, default=8): [default-valued] estimate on how many threads to allocate to k-mer stats collection step

Outputs

  • reads_assigned_to_father (File)
  • reads_assigned_to_mother (File)
  • unassigned_reads (File)

Dot Diagram

TrioBinChildLongReads