Skip to main content

GatherBatchEvidence

Runs CNV callers (cn.MOPS, GATK-gCNV) and combines single-sample raw evidence into a batch.

The following diagram illustrates the downstream workflows of the GatherBatchEvidence workflow in the recommended invocation order. You may refer to this diagram for the overall recommended invocation order.

Inputs

This workflow takes as input the read counts, BAF, PE, SD, SR, and per-caller VCF files produced in the GatherSampleEvidence workflow, and contig ploidy and gCNV models from the TrainGCNV workflow. The following is the list of the inputs the GatherBatchEvidence workflow takes.

batch

An identifier for the batch.

samples

Sets the list of sample IDs.

counts

Set to the GatherSampleEvidence.coverage_counts output.

Raw calls

The following inputs set the per-caller raw SV calls, and should be set if the caller was run in the GatherSampleEvidence workflow. You may set each of the following inputs to the linked output from the GatherSampleEvidence workflow.

PE_files

Set to the GatherSampleEvidence.pesr_disc output.

SR_files

Set to the GatherSampleEvidence.pesr_split

SD_files

Set to the GatherSampleEvidence.pesr_sd

matrix_qc_distance

You may refer to this file for an example value.

min_svsize

Sets the minimum size of SVs to include.

ped_file

A pedigree file describing the familial relationshipts between the samples in the cohort. Please refer to this section for details.

run_matrix_qc

Enables or disables running optional QC tasks.

gcnv_qs_cutoff

You may refer to this file for an example value.

cn.MOPS files

The workflow needs the following cn.MOPS files.

  • cnmops_chrom_file and cnmops_allo_file: FASTA index files (.fai) for respectively non-sex chromosomes (autosomes) and chromosomes X and Y (allosomes). The file format is explained on this page.

    You may use the following files for these fields:

    "cnmops_chrom_file": "gs://gcp-public-data--broad-references/hg38/v0/sv-resources/resources/v1/autosome.fai"
    "cnmops_allo_file": "gs://gcp-public-data--broad-references/hg38/v0/sv-resources/resources/v1/allosome.fai"
  • cnmops_exclude_list: You may use this file for this field.

GATK-gCNV inputs

The following inputs are configured based on the outputs generated in the TrainGCNV workflow.

The workflow also enables setting a few optional arguments of gCNV. The arguments and their default values are provided here as the following, and each argument is documented on this page and this page.

Docker images

The workflow needs the following Docker images, the latest versions of which are in this file.

  • cnmops_docker;
  • condense_counts_docker;
  • linux_docker;
  • sv_base_docker;
  • sv_base_mini_docker;
  • sv_pipeline_docker;
  • sv_pipeline_qc_docker;
  • gcnv_gatk_docker;
  • gatk_docker.

Static inputs

You may refer to this reference file for values of the following inputs.

  • primary_contigs_fai;
  • cytoband;
  • ref_dict;
  • mei_bed;
  • genome_file;
  • sd_locs_vcf.

Optional Inputs

The following is the list of a few optional inputs of the workflow, with an example of possible values.

  • "allosomal_contigs": [["chrX", "chrY"]]
  • "ploidy_sample_psi_scale": 0.001

Outputs

  • Combined read count matrix, SR, PE, and BAF files
  • Standardized call VCFs
  • Depth-only (DEL/DUP) calls
  • Per-sample median coverage estimates
  • (Optional) Evidence QC plots