GatherBatchEvidence
Runs CNV callers (cn.MOPS, GATK-gCNV) and combines single-sample raw evidence into a batch.
The following diagram illustrates the downstream workflows of the GatherBatchEvidence
workflow
in the recommended invocation order. You may refer to
this diagram
for the overall recommended invocation order.
Inputs
This workflow takes as input the read counts, BAF, PE, SD, SR, and per-caller VCF files produced in the GatherSampleEvidence workflow, and contig ploidy and gCNV models from the TrainGCNV workflow. The following is the list of the inputs the GatherBatchEvidence workflow takes.
batch
An identifier for the batch.
samples
Sets the list of sample IDs.
counts
Set to the GatherSampleEvidence.coverage_counts
output.
Raw calls
The following inputs set the per-caller raw SV calls, and should be set
if the caller was run in the GatherSampleEvidence
workflow.
You may set each of the following inputs to the linked output from
the GatherSampleEvidence workflow.
manta_vcfs
:GatherSampleEvidence.manta_vcf
;melt_vcfs
:GatherSampleEvidence.melt_vcf
;scramble_vcfs
:GatherSampleEvidence.scramble_vcf
;wham_vcfs
:GatherSampleEvidence.wham_vcf
.
PE_files
Set to the GatherSampleEvidence.pesr_disc
output.
SR_files
Set to the GatherSampleEvidence.pesr_split
SD_files
Set to the GatherSampleEvidence.pesr_sd
matrix_qc_distance
You may refer to this file for an example value.
min_svsize
Sets the minimum size of SVs to include.
ped_file
A pedigree file describing the familial relationshipts between the samples in the cohort. Please refer to this section for details.
run_matrix_qc
Enables or disables running optional QC tasks.
gcnv_qs_cutoff
You may refer to this file for an example value.
cn.MOPS files
The workflow needs the following cn.MOPS files.
-
cnmops_chrom_file
andcnmops_allo_file
: FASTA index files (.fai
) for respectively non-sex chromosomes (autosomes) and chromosomes X and Y (allosomes). The file format is explained on this page.You may use the following files for these fields:
"cnmops_chrom_file": "gs://gcp-public-data--broad-references/hg38/v0/sv-resources/resources/v1/autosome.fai"
"cnmops_allo_file": "gs://gcp-public-data--broad-references/hg38/v0/sv-resources/resources/v1/allosome.fai" -
cnmops_exclude_list
: You may use this file for this field.
GATK-gCNV inputs
The following inputs are configured based on the outputs generated in the TrainGCNV
workflow.
contig_ploidy_model_tar
:TrainGCNV.cohort_contig_ploidy_model_tar
gcnv_model_tars
:TrainGCNV.cohort_gcnv_model_tars
The workflow also enables setting a few optional arguments of gCNV. The arguments and their default values are provided here as the following, and each argument is documented on this page and this page.
Docker images
The workflow needs the following Docker images, the latest versions of which are in this file.
cnmops_docker
;condense_counts_docker
;linux_docker
;sv_base_docker
;sv_base_mini_docker
;sv_pipeline_docker
;sv_pipeline_qc_docker
;gcnv_gatk_docker
;gatk_docker
.
Static inputs
You may refer to this reference file for values of the following inputs.
primary_contigs_fai
;cytoband
;ref_dict
;mei_bed
;genome_file
;sd_locs_vcf
.
Optional Inputs
The following is the list of a few optional inputs of the workflow, with an example of possible values.
"allosomal_contigs": [["chrX", "chrY"]]
"ploidy_sample_psi_scale": 0.001
Outputs
- Combined read count matrix, SR, PE, and BAF files
- Standardized call VCFs
- Depth-only (DEL/DUP) calls
- Per-sample median coverage estimates
- (Optional) Evidence QC plots