["Genie 2: A large-scale foundation model" Parker-Holder et al]
["Generative AI for designing and validating easily synthesizable and structurally novel antibiotics" Swanson et al]
Probabilistic ML has made high dimensional inference tractable
1024x1024xTime
https://parti.research.google
A portrait photo of a kangaroo wearing an orange hoodie and blue sunglasses standing on the grass in front of the Sydney Opera House holding a sign on the chest that says Welcome Friends!
"Scaling Laws for Neural Language Models" Kaplan et al
"Sparks of Artificial General Intelligence: Early experiments with GPT-4" Bubeck et al
Produce Javascript code that creates a random graphical image that looks like a painting of Kandinsky
Draw a unicorn in TikZ
1. Machine Learning building blocks
2. Tutorial: Build your first classifier
BREAK
4. Introduction to Generative Models
5. Tutorial: Build your first generative model
1024x1024
Inductive biases!
Image Credit: CS231n Convolutional Neural Networks for Visual Recognition
Pixel 1
Pixel 2
Pixel N
Image Credit: Irhum Shakfat "Intuitively Understanding Convolutions for Deep Learning"
The Unifying architecture?
Input token
QUERY: What is X looking for?
KEY: What token X contains
VALUE: What token X will provide
"The dog chased the cat because it was playful."
But, we decide to break permutation invariance!
Positional Encodings
"Dog bites man"
"Man bites dog"
Image Credit: "Visualizing the loss landscape of neural networks" Hao Li et al
Image Credit: "Complete guide to Adam optimization" Hao Li et al
Pixel 1
Pixel 2
Pixel N
p Class 1
p Class 2
p Class 10
How different are two probability distributions?
Model Prediction
if True class is for i
otherwise
Truth: Class = 0
True class
Predicted probability
import flax.linen as nn
class MLP(nn.Module):
@nn.compact
def __call__(self, x):
# Linear
x = nn.Dense(features=64)(x)
# Non-linearity
x = nn.silu(x)
# Linear
x = nn.Dense(features=64)(x)
# Non-linearity
x = nn.silu(x)
# Linear
x = nn.Dense(features=2)(x)
return x
model = MLP()
import jax.numpy as jnp
example_input = jnp.ones((1,4))
params = model.init(jax.random.PRNGKey(0), example_input)
y = model.apply(params, example_input)
Architecture
Parameters
Call
Data
A PDF that we can optimize
Maximize the likelihood of the data
Maximize the likelihood of the training samples
Parametric Model
Training Samples
Trained Model
Evaluate probabilities
Low Probability
High Probability
Generate Novel Samples
Simulator
Generative Model
Generative Model
Simulator
GANS
VAEs
Normalizing
Flows
Diffusion Models
[Image Credit: https://lilianweng.github.io/posts/2018-10-13-flow-models/]
Base
Data
How is the bridge constrained?
Normalizing flows: Reverse = Forward inverse
Diffusion: Forward = Gaussian noising
Flow Matching: Forward = Interpolant
is p(x0) restricted?
Diffusion: p(x0) is Gaussian
Normalising flows: p(x0) can be evaluated
Is bridge stochastic (SDE) or deterministic (ODE)?
Diffusion: Stochastic (SDE)
Normalising flows: Deterministic (ODE)
sampled from a Gaussian distribution with mean 0 and variance 1
How is
distributed?
Base distribution
Target distribution
Invertible transformation
Normalizing flows in 1934
[Image Credit: "Understanding Deep Learning" Simon J.D. Prince]
Bijective
Sample
Evaluate probabilities
Probability mass conserved locally
Image Credit: "Understanding Deep Learning" Simon J.D. Prince
Splines
Issues NFs: Lack of flexibility
Neural Network
Sample
Evaluate probabilities
Forward Model
Observable
Dark matter
Dark energy
Inflation
Predict
Infer
Parameters
Inverse mapping
Fault line stress
Plate velocity
Normalizing flow
Continuity Equation
[Image Credit: "Understanding Deep Learning" Simon J.D. Prince]
Chen et al. (2018), Grathwohl et al. (2018)
Generate
Evaluate Probability
Loss requires solving an ODE!
Diffusion, Flow matching, Interpolants... All ways to avoid this at training time
Assume a conditional vector field (known at training time)
The loss that we can compute
The gradients of the losses are the same!
["Flow Matching for Generative Modeling" Lipman et al]
["Stochastic Interpolants: A Unifying framework for Flows and Diffusions" Albergo et al]
Intractable
Continuity equation
[Image Credit: "Understanding Deep Learning" Simon J.D. Prince]
Sample
Evaluate probabilities
Reverse diffusion: Denoise previous step
Forward diffusion: Add Gaussian noise (fixed)
Prompt
A person half Yoda half Gandalf
Denoising = Regression
Fixed base distribution:
Gaussian
["A point cloud approach to generative modeling for galaxy surveys at the field level"
Cuesta-Lazaro and Mishra-Sharma
International Conference on Machine Learning ICML AI4Astro 2023, Spotlight talk, arXiv:2311.17141]
Base Distribution
Target Distribution
Simulated Galaxy 3d Map
Prompt:
Prompt: A person half Yoda half Gandalf
Gaussian
MNIST
Students at MIT are
...
OVER-CAFFEINATED
NERDS
SMART
ATHLETIC
https://www.astralcodexten.com/p/janus-simulators
How do we encode "helpful" in the loss function?
Step 1
Human teaches desired output
Explain RLHF
After training the model...
Step 2
Human scores outputs
+ teaches Reward model to score
it is the method by which ...
Explain means to tell someone...
Explain RLHF
Step 3
Tune the Language Model to produce high rewards!
BEFORE RLHF
AFTER RLHF
Examples: Code execution, game playing, instruction following ....
[Image Credit: AgentBench https://arxiv.org/abs/2308.03688]
cuestalz@mit.edu