Aim: To develop unsupervised & probabilistic machine learning methodology to analyse scRNA-seq data for HCA

Lawrence Group 

Vidhi

Aditya

Neil 

Schematic: Current model

- Scales to large datasets (<100K cells).

 

- Learns biologically relevant and meaningful latent variables.

 

- Provides enhanced visualisation through projection of  the intermediate "learnt" latent spaces using traditional techniques (like UMAP, tSNE).

 

- Benchmarked against current SOTA methods (eg. scVI toolkit) on innate immunity and COVID dataset (~100k cells)

Current Results: Benchmarking innate immunity dataset

Reproduction (ours, 9x faster)

Current Results: COVID dataset

We find latent variables that capture the cell type-specific gene expression
signatures (e.g. Platelets in LV1 and LV2, separation between B cell and NK/T cell lineage in LV3)

(ours)

Strong association of LV1 and LV2 with MK genes (associated with COVID)

We find latent variables that capture the cell type-specific gene expression
signatures (e.g. Platelets in LV1 and LV2, separation between B cell and NK/T cell lineage in LV3)

Publications & preprints

[1] Modelling Technical and Biological Effects in single-cell RNA-seq data with Scalable Gaussian Process Latent Variable Models (GPLVMs), Vidhi Lalchand* , Aditya Ravuri*, Emma Dann*, Natsuhiko Kumasaka, Dinithi Sumanaweera, Rik G.H. Lindeboom, Shaista Madad, Sarah A. Teichmann, Neil D. Lawrence, preprint. (To be submitted to MLCB, 2022)

 

[2] A Unifying Probabilistic Perspective on Graph Latent Variable Models. Aditya Ravuri and Neil D. Lawrence, preprint. (To be submitted to AISTATS 2023)

 

[3] Generalised Gaussian process Latent Variable Models with Stochastic Variational Inference. Vidhi Lalchand, Aditya Ravuri and Neil D. Lawrence. AISTATS 2022

HCA Wellcome ext

By Vidhi Lalchand

HCA Wellcome ext

  • 14