Generalized Gaussian Process Latent Variable Models (GPLVM) with Stochastic Variational Inference

The 25th International Conference on Artificial Intelligence and Statistics

 Vidhi Lalchand, Aditya Ravuri, Neil D. Lawrence

AISTATS 2022

Bayesian GPLVM 

  • GPs can be used in the unsupervised settings by learning a non-linear, probabilistic mapping from latent space \( X \) to data-space \( Y \).
  • We assume the inputs \( X \) are latent (unobserved).

 

Given: High dimensional training data \( Y \equiv \{\bm{y}_{n}\}_{n=1}^{N},  Y \in \mathbb{R}^{N \times D}\)

Learn: Low dimensional latent space \( X \equiv \{\bm{x}_{n}\}_{n=1}^{N}, X \in \mathbb{R}^{N \times Q}\)

Titsias & Lawrence (2010), Lawrence (2005)

Primer:

N x D

D - independent Gaussian processes

low dimensional latent space

High-dimensional data space

Primer: Bayesian GPLVM

\textbf{F} \equiv \{ f_{d} \}_{{d=1}}^{D}
\textbf{U} \equiv \{ u_{d} \}_{{d=1}}^{D}
p(\textbf{U}) = \prod_{d=1}^{D}p(u_{d}|Z) = \mathcal{N}(\textbf{0}, K_{mm})

where, 

Q_{nn} = K_{nn} - K_{nm}K_{mm}^{-1}K_{mn}

Generative Model

Scalable Bayesian GPLVM 

Stochastic variational inference for GP regression was introduced in Hensman et al (2013). In this work we use the SVI bound in the latent variable setting with unknown \(X\) and multi-output \(Y\).

\begin{aligned} \mathcal{L}_{1:D} &= \sum_{n,d}\log \mathcal{N}(y_{n,d}|\underbrace{\langle k^{T}_{n} \rangle _{q(\bm{x}_{n})}}_{\Psi^{(n,\cdot)}_{1}}K_{mm}^{-1}\bm{m}_{d}, \sigma^{2}_{y}) -\dfrac{1}{2\sigma^{2}_{y}} \textrm{Tr}(\underbrace{\langle K_{nn}\rangle_{q(\bm{x}_{n})}}_{\psi_{0}} - K_{mm}^{-1}\underbrace{\langle K_{mn}K_{nm} \rangle_{q(\bm{x}_{n})}}_{\Psi_{2}}) - \dfrac{1}{2\sigma^{2}_{y}} \textrm{Tr}(S_{d}K_{mm}^{-1}\underbrace{\langle K_{mn}K_{nm}\rangle_{q(\bm{x}_{n})}}_{\Psi_{2}}K_{mm}^{-1}) \\ - &\sum_{n}\textrm{KL}(q(\bm{x}_{n})||p(\bm{x}_{n})) - \sum_{d}\textrm{KL}(q(\bm{u}_{d})||p(\bm{u}_{d})) \end{aligned}
\begin{aligned} \mathcal{L}_{1:D} = \sum_{d=1}^{D}\mathcal{L}_{d} = \sum_{d=1}^{D}\sum_{n=1}^{N} \mathbb{E}_{q(f, \textbf{X}, \textbf{U})}[\log p(y_{n,d}|\bm{f}_{d}, \bm{x}_{n})] - \sum_{d}\textrm{KL}(q(\bm{u}_{d})||p(\bm{u}_{d})) - \sum_{n}\textrm{KL}(q(\bm{x}_{n})||p(\bm{x}_{n})) \end{aligned}

ELBO:

Final factorised form:

\begin{aligned} =\sum_{d=1}^{D}\sum_{n=1}^{N}\int q_{\phi}(\bm{x}_{n})\underbrace{\int q_{\lambda}(\bm{u}_{d})\int p(\bm{f}_{d}|\bm{u}_{d}, \bm{x}_{n}) \log\mathcal{N}(y_{n,d};\bm{f}_{d}(\bm{x}_{n}), \sigma^{2}_{y}) d\bm{f}_{d}(\bm{x}_{n})d\bm{u}_{d}d\bm{x}_{n}- \sum_{d}\textrm{KL}(q(\bm{u}_{d})||p(\bm{u}_{d}))}_{\textrm{SVGP Regression bound}} - \sum_{n}\textrm{KL}(q(\bm{x}_{n})||p(\bm{x}_{n})) \end{aligned}
p(\textbf{Y}) \geq

Non-Gaussian likelihoods         Flexible variational families         Amortised Inference         Interdomain inducing variables         Missing data problems        

Non-Gaussian likelihoods         Flexible variational families         Amortised Inference         Interdomain inducing variables         Missing data problems        

Algorithm

Oilflow (12d)

Point MAP Bayesian-SVI AEB-SVI
RMSE 0.341 (0.008)  0.569 (0.092) 0.0925 (0.025) 0.067 (0.0016)
NLPD 4.104 (3.223) 8.16 (1.224) -11.3105 (0.243) -11.392 (0.147)

Robust to Missing Data: MNIST Reconstruction

30% 

60% 

Robust to Missing Data: Motion Capture 

Summary

Thank you!

  • We present a scalable generalized GPLVM model which leverages SVI for inference and is compatible with non-Gaussian likelihoods, flexible variational distributions and massively missing data.
  • The model is versatile enough to be combined with different priors in latent space - this might be important for better disentanglement of latent representations, future work would focus on such insights.

Generalised GPLVM with Stochastic Variational Inference

By Vidhi Lalchand

Generalised GPLVM with Stochastic Variational Inference

  • 13