StripyWorkflow

WDL source code

StripyWorkflow is an optional standalone workflow that runs STRipy on a single sample to genotype a curated set of known pathogenic short tandem repeat (STR) expansions. It is intended for targeted follow-up rather than genome-wide STR discovery.

In joint-calling workflows, this module is typically run after EvidenceQC and sample QC once the cohort PED file has been finalized. The resulting single-sample STRipy VCFs can then be merged in ClusterBatch and appended to the final cohort VCF in AnnotateVcf.

The following diagram illustrates the recommended invocation order:

note

This workflow is optional. Run it only for samples where targeted analysis of known pathogenic STR expansions is desired.

Inputs

`bam_or_cram_file`

Sample alignment file in BAM or CRAM format.

Optional `bam_or_cram_index`

Index for the input BAM or CRAM. If omitted, the workflow expects the index to be located beside the input file using the standard .bai or .crai extension.

`ped_file`

PED file used to look up the sample sex for STRipy. See PED file format.

`reference_fasta`, `reference_fasta_fai`

Reference FASTA and FASTA index matching the aligned sample.

`sample_name`

Sample identifier. This must match the sample ID in the PED file.

Optional `genome_build`

Reference build name passed to STRipy. Default: hg38.

Optional `locus`

Comma-separated list of loci to analyze. By default, the workflow runs STRipy on its built-in panel of known pathogenic repeat-expansion loci.

Optional `custom_catalog`

Custom STRipy catalog file. Use this to add or override loci beyond the default pathogenic panel.

Optional `analysis`

STRipy analysis mode. Default: standard.

Optional `config`

Base STRipy configuration file.

Optional `verbose`

Enable verbose STRipy logging. Default: false.

Outputs

`stripy_json`

Per-sample STRipy JSON output.

`stripy_tsv`

Per-sample STRipy tabular summary.

`stripy_html`

Per-sample STRipy HTML report.

Optional `stripy_vcf`

Single-sample STRipy VCF for downstream merging in ClusterBatch and optional inclusion in the final cohort VCF.

Inputs​

bam_or_cram_file​

Optional bam_or_cram_index​

ped_file​

reference_fasta, reference_fasta_fai​

sample_name​

Optional genome_build​

Optional locus​

Optional custom_catalog​

Optional analysis​

Optional config​

Optional verbose​

Outputs​

stripy_json​

stripy_tsv​

stripy_html​

Optional stripy_vcf​