Postprocessing_Tasks

CreateCountMatrixFromAnnotatedBam

description
Creates a count matrix TSV file from the given annotated bam file. Bam file must contain tags that indicate the gene/transcript (XG), cell barcode (CB), and umi (BX) of the read.
author
Jonn Smith
email
jonn@broadinstitute.org

Inputs

Required

  • annotated_transcriptome_bam (File, required): Annotated transcriptome bam file. Must contain tags that indicate the gene/transcript (XG), cell barcode (CB), and umi (BX) of the read.

Optional

  • runtime_attr_override (RuntimeAttr?)
  • tx_equivalence_class_assignments (File?): Optional file containing a list of equivalence classes for each transcript.

Defaults

  • prefix (String, default="umi_tools_group"): Prefix for the output file. The output file will be named .tsv
  • umi_tag (String, default="ZU"): The tag that contains the umi. Default is ZU.

Outputs

  • count_matrix (File)

CreateCountMatrixAnndataFromEquivalenceClasses

description
Creates a python anndata object from the given countmatrix tsv and equivalence classes. Expects the input to have been generated by CreateCountMatrixFromAnnotatedBam. The resulting anndata object can be directly read into scanpy for single-cell analysis.
author
Jonn Smith
email
jonn@broadinstitute.org

Inputs

Required

  • count_matrix_tsv (File, required): The TSV file containing the count matrix to convert to anndata.
  • gene_equivalence_class_assignments (File, required): The equivalence class assignments file to use for the anndata object.
  • gene_equivalence_class_definitions (File, required): The equivalence class definitions file to use for the anndata object.
  • genome_annotation_gtf_file (File, required): The GTF file containing the genome annotation.
  • tx_equivalence_class_assignments (File, required): The equivalence class assignments file to use for the anndata object.
  • tx_equivalence_class_definitions (File, required): The equivalence class definitions file to use for the anndata object.

Optional

  • gencode_reference_gtf_file (File?): The gencode reference GTF file
  • overlap_interval_label (String?): The label to use for the overlap intervals.
  • overlap_intervals (File?): Overlaping interval file.
  • runtime_attr_override (RuntimeAttr?)

Defaults

  • force_anndata_gencode_overwrite (Boolean, default=false)
  • prefix (String, default="umi_tools_group"): The prefix to use for the output anndata object.

Outputs

  • transcript_gene_count_anndata_h5ad (File)
  • pickles (Array[File])

QuantifyGffComparison

description
Create equivalence classes and gene assignments from a set of gffcompare results.
author
Jonn Smith
email
jonn@broadinstitute.org

Inputs

Required

  • gencode_read_refmap (File, required): Refmap file (produced by gffcompare) comparing the genome reference gtf to input reads (in GFF format).
  • gencode_read_tmap (File, required): Tmap file (produced by gffcompare) comparing the genome reference gtf to input reads (in GFF format).
  • gencode_st2_refmap (File, required): Refmap file (produced by gffcompare) comparing the genome reference gtf to the stringtie2 discovered transcriptome.
  • gencode_st2_tmap (File, required): Tmap file (produced by gffcompare) comparing the genome reference gtf to the stringtie2 discovered transcriptome.
  • genome_gtf (File, required): Genome annotation GTF file (usually gencode).
  • st2_gencode_refmap (File, required): Refmap file (produced by gffcompare) comparing the stringtie2 discovered transcriptome to the genome reference gtf.
  • st2_gencode_tmap (File, required): Tmap file (produced by gffcompare) comparing the stringtie2 discovered transcriptome to the genome reference gtf.
  • st2_read_refmap (File, required): Refmap file (produced by gffcompare) comparing the stringtie2 discovered transcriptome to input reads (in GFF format).
  • st2_read_tmap (File, required): Tmap file (produced by gffcompare) comparing the stringtie2 discovered transcriptome to input reads (in GFF format).

Optional

  • runtime_attr_override (RuntimeAttr?)

Defaults

  • prefix (String, default="reads_comparison"): Prefix for ouput file.

Outputs

  • gene_eq_class_labels_file (File)
  • gene_assignments_file (File)
  • tx_equivalence_class_labels_file (File)
  • tx_equivalence_class_file (File)
  • graph_gpickle (File)

CombineEqClassFiles

description
Combine equivalence classes and gene assignments from disjoint sets of reads produced by QuantifyGffComparison.
author
Jonn Smith
email
jonn@broadinstitute.org

Inputs

Required

  • equivalence_class_definitions (Array[File], required): TSV files containing transcript equivalence class definitions as produced by QuantifyGffComparison.tx_equivalence_class_labels_file.
  • equivalence_classes (Array[File], required): TSV files containing read -> transcript equivalence class assignments as produced by QuantifyGffComparison.tx_equivalence_class_file.
  • gene_assignment_files (Array[File], required): TSV files containing read -> gene equivalence class assignments as produced by QuantifyGffComparison.gene_assignments_file.
  • gene_eq_class_definitions (Array[File], required): TSV files containing equivalence class definitions for genes as produced by QuantifyGffComparison.gene_eq_class_labels_file.

Optional

  • runtime_attr_override (RuntimeAttr?)

Defaults

  • prefix (String, default="combined"): Prefix for ouput file.

Outputs

  • combined_gene_eq_class_defs (File)
  • combined_gene_eq_class_assignments (File)
  • combined_tx_eq_class_defs (File)
  • combined_tx_eq_class_assignments (File)

CopyEqClassInfoToTag

description
Copy the gene assignment for each given read into the given tag for each read.
author
Jonn Smith
email
jonn@broadinstitute.org

Inputs

Required

  • bam (File, required): BAM file containing reads to copy gene assignments into.
  • eq_class_file (File, required): TSV file containing read -> gene equivalence class assignments as produced by QuantifyGffComparison.gene_assignments_file.

Optional

  • runtime_attr_override (RuntimeAttr?)

Defaults

  • eq_class_tag (String, default="eq"): Tag to copy gene equivalence class into.
  • gene_tag (String, default="XG"): Tag to copy gene assignment into.
  • prefix (String, default="combined"): Prefix for ouput file.

Outputs

  • bam_out (File)
  • bai (File)