AssignChildLongReads

AssignChildLongReadsGivenParentalKmerStats

description
A workflow that performs trio-binning of child long reads given parental (short) reads. Based on the trio-canu publication 'De novo assembly of haplotype-resolved genomes with trio binning' https://www.nature.com/articles/nbt.4277 . This holds the sub-workflow for part two: given the k-mer stats database from part one, classify child long reads. We separate this out based on two concerns: 1. we can test out using different k-value when collecting parental k-mer states 2. we can collect parental k-mer stats once and classify all children reads (different sibblings, technologies) separately.

Inputs

Required

  • child_long_reads_bucket (String, required): GCS bucket path holding FASTA/FASTQ of child long reads
  • long_read_platform (String, required): platform of long read sequencing; currently only one of [pacbio-raw, nanopore-raw] is supported
  • meryl_db_files_father (Array[File], required): Meryl databases files on paternal (short) reads
  • meryl_db_files_mother (Array[File], required): Meryl databases files on maternal (short) reads
  • meryl_stats_father (File, required): Meryl statistics single file on paternal (short) reads
  • meryl_stats_mother (File, required): Meryl statistics single file on maternal (short) reads
  • vm_local_monitoring_script (File, required): GCS file holding a resouce monitoring script that runs locally and collects info for a very specific purpose
  • workdir_name (String, required): name of working directory

Optional

  • run_with_debug (Boolean?): [optional] whether to run in debug mode (takes significantly more disk space and more logs); defaults to false
  • AssignChildLongReads.runtime_attr_override (RuntimeAttr?)

Defaults

  • child_read_assign_memoryG_est (Int, default=32): [default-valued] estimate on how many GB memory to allocate to the child longread classification step
  • child_read_assign_threads_est (Int, default=36): [default-valued] estimate on how many threads to allocate to the child longread classification step

Outputs

  • reads_assigned_to_father (File)
  • reads_assigned_to_mother (File)
  • unassigned_reads (File)

Dot Diagram

AssignChildLongReads