Generalized Gaussian Process Latent Variable Models (GPLVM) with Stochastic Variational Inference

The 25th International Conference on Artificial Intelligence and Statistics

Vidhi Lalchand, Aditya Ravuri, Neil D. Lawrence

AISTATS 2022

Bayesian GPLVM

GPs can be used in the unsupervised settings by learning a non-linear, probabilistic mapping from latent space \( X \) to data-space \( Y \).
We assume the inputs \( X \) are latent (unobserved).

Given: High dimensional training data \( Y \equiv \{\bm{y}_{n}\}_{n=1}^{N}, Y \in \mathbb{R}^{N \times D}\)

Learn: Low dimensional latent space \( X \equiv \{\bm{x}_{n}\}_{n=1}^{N}, X \in \mathbb{R}^{N \times Q}\)

Titsias & Lawrence (2010), Lawrence (2005)

Primer:

N x D

D - independent Gaussian processes

low dimensional latent space

High-dimensional data space

Primer: Bayesian GPLVM

\textbf{F} \equiv \{ f_{d} \}_{{d=1}}^{D}

\textbf{U} \equiv \{ u_{d} \}_{{d=1}}^{D}

p(\textbf{U}) = \prod_{d=1}^{D}p(u_{d}|Z) = \mathcal{N}(\textbf{0}, K_{mm})

where,

Q_{nn} = K_{nn} - K_{nm}K_{mm}^{-1}K_{mn}

Generative Model

Scalable Bayesian GPLVM

Stochastic variational inference for GP regression was introduced in Hensman et al (2013). In this work we use the SVI bound in the latent variable setting with unknown \(X\) and multi-output \(Y\).

\begin{aligned} \mathcal{L}_{1:D} &= \sum_{n,d}\log \mathcal{N}(y_{n,d}|\underbrace{\langle k^{T}_{n} \rangle _{q(\bm{x}_{n})}}_{\Psi^{(n,\cdot)}_{1}}K_{mm}^{-1}\bm{m}_{d}, \sigma^{2}_{y}) -\dfrac{1}{2\sigma^{2}_{y}} \textrm{Tr}(\underbrace{\langle K_{nn}\rangle_{q(\bm{x}_{n})}}_{\psi_{0}} - K_{mm}^{-1}\underbrace{\langle K_{mn}K_{nm} \rangle_{q(\bm{x}_{n})}}_{\Psi_{2}}) - \dfrac{1}{2\sigma^{2}_{y}} \textrm{Tr}(S_{d}K_{mm}^{-1}\underbrace{\langle K_{mn}K_{nm}\rangle_{q(\bm{x}_{n})}}_{\Psi_{2}}K_{mm}^{-1}) \\ - &\sum_{n}\textrm{KL}(q(\bm{x}_{n})||p(\bm{x}_{n})) - \sum_{d}\textrm{KL}(q(\bm{u}_{d})||p(\bm{u}_{d})) \end{aligned}

\begin{aligned} \mathcal{L}_{1:D} = \sum_{d=1}^{D}\mathcal{L}_{d} = \sum_{d=1}^{D}\sum_{n=1}^{N} \mathbb{E}_{q(f, \textbf{X}, \textbf{U})}[\log p(y_{n,d}|\bm{f}_{d}, \bm{x}_{n})] - \sum_{d}\textrm{KL}(q(\bm{u}_{d})||p(\bm{u}_{d})) - \sum_{n}\textrm{KL}(q(\bm{x}_{n})||p(\bm{x}_{n})) \end{aligned}

ELBO:

Final factorised form:

\begin{aligned} =\sum_{d=1}^{D}\sum_{n=1}^{N}\int q_{\phi}(\bm{x}_{n})\underbrace{\int q_{\lambda}(\bm{u}_{d})\int p(\bm{f}_{d}|\bm{u}_{d}, \bm{x}_{n}) \log\mathcal{N}(y_{n,d};\bm{f}_{d}(\bm{x}_{n}), \sigma^{2}_{y}) d\bm{f}_{d}(\bm{x}_{n})d\bm{u}_{d}d\bm{x}_{n}- \sum_{d}\textrm{KL}(q(\bm{u}_{d})||p(\bm{u}_{d}))}_{\textrm{SVGP Regression bound}} - \sum_{n}\textrm{KL}(q(\bm{x}_{n})||p(\bm{x}_{n})) \end{aligned}

p(\textbf{Y}) \geq

Non-Gaussian likelihoods Flexible variational families Amortised Inference Interdomain inducing variables Missing data problems

Algorithm

Oilflow (12d)

	Point	MAP	Bayesian-SVI	AEB-SVI
RMSE	0.341 (0.008)	0.569 (0.092)	0.0925 (0.025)	0.067 (0.0016)
NLPD	4.104 (3.223)	8.16 (1.224)	-11.3105 (0.243)	-11.392 (0.147)

Robust to Missing Data: MNIST Reconstruction

30%

60%

Robust to Missing Data: Motion Capture

Summary

Thank you!

We present a scalable generalized GPLVM model which leverages SVI for inference and is compatible with non-Gaussian likelihoods, flexible variational distributions and massively missing data.

The model is versatile enough to be combined with different priors in latent space - this might be important for better disentanglement of latent representations, future work would focus on such insights.

Generalised GPLVM with Stochastic Variational Inference

By Vidhi Lalchand

Generalised GPLVM with Stochastic Variational Inference

Vidhi Lalchand

Doctoral Student @ Cambridge Bayesian Machine Learning

vidhilalchand.co.uk

Generalized Gaussian Process Latent Variable Models (GPLVM) with Stochastic Variational Inference

The 25th International Conference on Artificial Intelligence and Statistics

AISTATS 2022

Generalised GPLVM with Stochastic Variational Inference

More from Vidhi Lalchand