From Zero to Generative
IAIFI Fellow, MIT

Carolina Cuesta-Lazaro
Art: "The art of painting" by Johannes Vermeer
Learning Generative Modelling from scratch



Cuenca
Spain

Heidelberg
Germany



Tokyo
Japan
Durham
England
Boston
US

About Myself


Medical Imaging
Epidemiology: Agent Based simulations
OBSERVED
SIMULATED
Cosmology
Simulations
HPC
Science question
Statistics ML


Natural Language


Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
BEFORE
Artificial General Intelligence?
AFTER


Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative

https://parti.research.google

A portrait photo of a kangaroo wearing an orange hoodie and blue sunglasses standing on the grass in front of the Sydney Opera House holding a sign on the chest that says Welcome Friends!



Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Scaling laws and emergent abilities
"Scaling Laws for Neural Language Models" Kaplan et al



Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative

"Sparks of Artificial General Intelligence: Early experiments with GPT-4" Bubeck et al


Produce Javascript code that creates a random graphical image that looks like a painting of Kandinsky

Draw a unicorn in TikZ




Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative


Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative

Today's Plan
1. Recap of the Machine Learning building blocks
2. Learning to classify
BREAK
3. Tutorial: Build your first classifier
4. Introduction to Generative Models
5. Tutorial: Build your first generative model
(if time permits)

BREAK
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
The building blocks: 1. Data
Cosmic Cartography
(Pointclouds)


MNIST
(Images)

Wikipedia
(Text)
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative

1024x1024
The curse of dimensionality

Inductive biases!
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
The building blocks: 2. Architectures

Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Multilayer Perceptron
Image Credit: CS231n Convolutional Neural Networks for Visual Recognition

Pixel 1
Pixel 2
Pixel N

Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Convolutional Neural Networks
Inductive bias: Translation Invariance
Data Representation: Images

Image Credit: Irhum Shakfat "Intuitively Understanding Convolutions for Deep Learning"
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Inductive bias: Permutation Invariance
Data Representation: Sets, Pointclouds
Deep sets

Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative



Transformers might be the unifying architecture!


Text
Images
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
The building blocks: 3. Loss function

Image Credit: "Visualizing the loss landscape of neural networks" Hao Li et al
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
The building blocks: 4. The Optimizer

Image Credit: "Complete guide to Adam optimization" Hao Li et al
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Tutorial 1: Learning to classify

Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
How do we output a probability?
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative

Pixel 1
Pixel 2
Pixel N

p Class 1
p Class 2
p Class 10
Loss function: Cross entropy
How different are two probability distributions?


Model Prediction
if True class is for i
otherwise
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Truth: Class = 0
True class
Predicted probability
Loss function: Cross entropy
How different are two probability distributions?


Model Prediction
Truth: Class = 0
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Predicted probability
True class

Loss function: Cross entropy
How different are two probability distributions?


Model Prediction
Truth: Class = 0
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative

import flax.linen as nn
class MLP(nn.Module):
@nn.compact
def __call__(self, x):
# Linear
x = nn.Dense(features=64)(x)
# Non-linearity
x = nn.silu(x)
# Linear
x = nn.Dense(features=64)(x)
# Non-linearity
x = nn.silu(x)
# Linear
x = nn.Dense(features=2)(x)
return x
model = MLP()
Jax Models
import jax.numpy as jnp
example_input = jnp.ones((1,4))
params = model.init(jax.random.PRNGKey(0), example_input)
y = model.apply(params, example_input)
Architecture
Parameters
Call


A 2D animation of a folk music band composed of anthropomorphic autumn leaves, each playing traditional bluegrass instruments, amidst a rustic forest setting dappled with the soft light of a harvest moon
Image credit: DALL·E 3
1024x1024
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative

Generation vs Discrimination
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative

Data
A PDF that we can optimize
Maximize the likelihood of the data
Generative Models
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Maximize the likelihood of the training samples
Model


Training Samples









Generative Models 101
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Trained Model

Evaluate probabilities


Low Probability
High Probability

Generate Novel Samples


Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Change of variables
sampled from a Gaussian distribution with mean 0 and variance 1
How is
distributed?

Base distribution

Target distribution


Invertible transformation

Normalizing flows
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative


Box-Muller transform
Normalizing flows in 1934
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative

(Image Credit: Phillip Lippe)
z: Latent variables
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative

Invertible functions aren't that common!
Splines

Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
But ODE solutions are always invertible!
Continuous time
Issues NFs: Lack of flexibility
- Invertible functions
- Tractable Jacobians
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Flow ODE

Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative


Chen et al. (2018), Grathwohl et al. (2018)

Generate
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Evaluate Probability
Need to solve this expensive integral at each step during training!
Very slow
Can we avoid it?
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Flow matching
Regress the velocity field directly!
But we need to know u. If we know u, then why learn another one?


Image Credit: "An Introduction to flow matchig" Tor Fjelde et al
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Conditional Flow matching
Learn a conditional vector field (known at training time)
Approximate it with an unconditional one
The gradients of the losses are the same!
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative

Tutorial 2

Gaussian
MNIST
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Students at MIT are
Pre-trained on next word prediction
...
OVER-CAFFEINATED
NERDS
SMART
ATHLETIC
Large Language Models
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative

https://www.astralcodexten.com/p/janus-simulators
How do we encode "helpful" in the loss function?
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Step 1
Human teaches desired output
Explain RLHF

After training the model...
Step 2
Human scores outputs
+ teaches Reward model to score
it is the method by which ...

Explain means to tell someone...

Explain RLHF
Step 3
Tune the Language Model to produce high rewards!
RLHF: Reinforcement Learning from Human Feedback
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative

BEFORE RLHF

AFTER RLHF
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
-
Books by Kevin P. Murphy
- Machine learning, a probabilistic perspective
- Probabilistic Machine Learning: advanced topics
- ML4Astro workshop https://ml4astro.github.io/icml2023/
- ProbAI summer school https://github.com/probabilisticai/probai-2023
- IAIFI Summer school
- Blogposts
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
References
cuestalz@mit.edu


Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
From zero to generative - SummerSchool
By carol cuesta
From zero to generative - SummerSchool
- 431