CollectParentsKmerStats

CollectParentsKmerStats

description
A workflow that performs trio-binning of child long reads given parental (short) reads. Based on the trio-canu publication https://www.nature.com/articles/nbt.4277. This holds the sub-workflow for part one: collect k-mer stats given parental (short) reads

Inputs

Required

  • father_short_reads_bucket (String, required): GCS bucket path holding FASTA/FASTQ of (short) reads of paternal origin
  • genome_size (String, required): an esimate on genome size of the specicies (affects k-value picking)
  • mother_short_reads_bucket (String, required): GCS bucket path holding FASTA/FASTQ of (short) reads of maternal origin
  • workdir_name (String, required): name of working directory

Optional

  • kmerSize (Int?): [optional] force specifying k-value in collecting k-mer stats on parents
  • run_with_debug (Boolean?): [optional] whether to run in debug mode (takes significantly more disk space and more logs); defaults to false
  • MerylCount.runtime_attr_override (RuntimeAttr?)
  • MerylMergeAndSubtract.runtime_attr_override (RuntimeAttr?)
  • ParentalReadsRepartitionAndMerylConfigure.runtime_attr_override (RuntimeAttr?)

Defaults

  • meryl_operations_threads_est (Int, default=8): [default-valued] estimate on how many threads to allocate to k-mer stats collection step

Outputs

  • Father_haplotype_merylDB (Array[File])
  • Mother_haplotype_merylDB (Array[File])
  • Father_reads_statistics (File)
  • Mother_reads_statistics (File)

Dot Diagram

CollectParentsKmerStats