P. falciparum Short Read Whole Genome Workspace

This is the workspace for short read whole genome variant discovery and analysis in Plasmodium falciparum. This workspace can call variants in a single-sample, joint call cohorts of samples, and perform various tertiary analyses (e.g. drug resistance screening, rapid diagnostic test evasion screening, etc.).

While the current focus of this workspace is P. falciparum, but the processing steps here are generalized and can be adapted to other Plasmodium species.

Variant Calling Pipeline

As part of this workspace there are workflows to call variants on both single samples, and for joint calling across cohorts of samples.

The main variant calling pipeline has has the following high-level structure:

LRMA SP Malaria Variant Calling

Data

Datasets

The following datasets are currently in this workspace: - PF7 - The MalariaGEN crosses - 2022 data collected in Senegal - 2019 data collected in Senegal

Data Structure

The data processing is broken down into three levels (similar to other LRMA projects) in the following Terra data tables: * Sample (flowcell data) * Sample Set (sample data / single-sample calling) * Sample Set Set (cohort data for joint calling)

Sample / Flowcell data consists of reads from a single flowcell. The sample from which these reads have been processed may or may not be represented in other flowcells.

Sample Set data consists of all data from a specific sample. This may include data from multiple flowcells that belong to the same "participant" (i.e. same strain / clone).

Sample Set Set / Cohort data consists of data from multiple samples.