BVAS Sampler

class bvas.BVASSampler(Y, Gamma, S=5, tau=100.0, nu_eff_multiplier=1.0, explore=10.0, xi_target=0.2, genotype_matrix=None)[source]

Bases: MCMCSampler

MCMC Sampler for Bayesian Viral Allele Selection (BVAS). Combines a Gaussian diffusion-based likelihood with Bayesian Variable Selection. Most users will not use this class directly and will instead use BVASSelector.

Note that computations will be done using the device and dtype of the provided torch.Tensor`s, i.e. `Y and Gamma. If you would like computations to be done with a GPU make sure that these tensors are on the GPU. We recommend doing all computations in 64-bit precision, i.e. Y.dtype == Gamma.dtype == torch.float64.

The inputs Y and Gamma are defined in terms of region-specific allele frequencies \({\mathbf x}_r(t)\) and region-specific effective population sizes \(\nu_r\) as follows.

\[ \begin{align}\begin{aligned}&{\mathbf y}_r(t) = {\mathbf x}_r(t + 1) - {\mathbf x}_r(t)\\&\bar{\mathbf{Y}}^\nu \equiv \sum_r \nu_r \sum_t {\mathbf y}_r(t)\\&\Lambda_{r,ab}(t) = x_{r,ab}(t) - x_{r,a}(t) x_{r,b}(t)\\&\bar{\mathbf{\Lambda}}^\nu \equiv \sum_r \nu_r \sum_t {\mathbf \Lambda}_r(t)\end{aligned}\end{align} \]

where \(x_{r,ab}(t)\) denote pairwise allele frequencies in region \(r\).

Parameters:
  • Y (torch.Tensor) – A vector of shape (A,) that encodes integrated alelle frequency increments for each allele and where A is the number of alleles.

  • Gamma (torch.Tensor) – A matrix of shape (A, A) that encodes information about second moments of allele frequencies.

  • S – Controls the expected number of alleles to include in the model a priori. Defaults to 5.0. To specify allele-level prior inclusion probabilities provide a A-dimensional torch.Tensor of the form (h_1, …, h_A). If a tuple of positive floats (alpha, beta) is provided, the a priori inclusion probability is a latent variable governed by the corresponding Beta prior so that the sparsity level is inferred from the data. Note that for a given choice of alpha and beta the expected number of alleles to include in the model a priori is given by \(\frac{\alpha}{\alpha + \beta} \times A\). We caution that this approach may be a poor choice for very noisy genomic surveillance data. Also note that the mean number of covariates in the posterior can vary significantly from prior expectations, since the posterior is in effect a compromise between the prior and the observed data.

  • tau (float) – Controls the precision of the coefficients in the prior. Defaults to 100.0.

  • nu_eff_multiplier (float) – Additional factor by which to multiply the effective population size, i.e. on top of whatever was done when computing Y and Gamma. Defaults to 1.0.

  • explore (float) – This hyperparameter controls how greedy the MCMC algorithm is. Defaults to 10.0. For expert users only.

  • xi_target (float) – This hyperparameter controls how often \(h\) MCMC updates are made if \(h\) is a latent variable. Defaults to 0.2. For expert users only.

  • genotype_matrix (torch.Tensor) – A matrix of shape (num_variants, A) that encodes the genotype of various viral variants. If included the sampler will compute variant-level growth rates during inference for the varaints encoded by genotype_matrix. Defaults to None.