Lesson 23 • Advanced

Generative Models: Autoencoders, VAEs & GANs

By the end of this lesson you'll understand how machines create brand-new data — and you'll have written a working text generator that samples a distribution it learned itself.

What You'll Learn in This Lesson

✓Tell discriminative models apart from generative ones
✓Name the 4 families: autoregressive, VAE, GAN, diffusion
✓Explain latent space and why interpolation is smooth
✓Sample new data from a learned probability distribution
✓Build a Markov-chain text generator in plain Python
✓Read evaluation metrics like FID and judge sample quality

Before you start: It helps to have met Training Stability and to be comfortable with basic Python (loops, dictionaries, and random). No maths library is needed — every example here runs in plain Python.

🎨 Real-World Analogy: The Artist Who Learns a Style

Picture an artist who studies hundreds of Van Gogh paintings. They never copy a single one — instead they absorb the style: the swirling skies, the thick brushstrokes, the colour choices. Afterwards they can paint a brand-new scene that has never existed, yet unmistakably looks like a Van Gogh.

That is exactly what a generative model does. During training it watches a mountain of real examples and learns the underlying data distribution — the statistical "style" of the data. During generation it samples from that learned distribution to produce something new that fits the style but isn't a copy.

Keep this picture in mind: learn a style → create new work. Everything below is just different machinery for doing those two steps.

1Discriminative vs Generative

Most models you've met so far are discriminative: you hand them data and they predict a label. A spam filter learns the line between "spam" and "not spam". In maths, it estimates P(label | data) — the probability of a label given the input.

A generative model is more ambitious. Instead of drawing a boundary, it learns what the data itself looks like — P(data). Because it understands the whole shape of the data, it can sample brand-new examples: a new sentence, a new face, a new song.

Discriminative asks:

"Is this email spam?" — it sorts existing things into bins.

Generative asks:

"What does a typical email look like?" — it can write a new one.

2Sampling From a Learned Distribution

At its heart, generation = sampling. A generative model is really just a probability distribution you can draw from. Run the worked example below: it samples colours whose probabilities were "learned", then checks that the samples echo those probabilities.

Read every comment, then hit run. Notice the output counts roughly match the learned weights — that's the model reproducing the style of its data.

Worked Example: Sample a Distribution

Draw outcomes whose frequencies match learned probabilities

Try it Yourself »

Python

import random

# A generative model's core skill: SAMPLE from a probability distribution.
# Here the "distribution" is the colour mix of marbles in a learned bag.
# Generating = reaching in and pulling one out at random.

random.seed(7)  # makes the run repeatable so you can compare outputs

# A learned distribution: each colour has a weight (how common it is)
distribution = {
    "red":    0.5,   # 50% of the probability mass
    "blue":   0.3,   # 30%
    "green":  0.2,   # 20%
}

def sample(d
...

3Autoregressive Models: One Token at a Time

An autoregressive model generates a sequence by predicting the next item from everything before it, then feeding that prediction back in. GPT writes a sentence one token at a time this way. The simplest version is a Markov chain: it only looks at the current word to guess the next.

The worked example below learns which words follow which from a four-line corpus, then generates new sentences by repeatedly sampling. It is a complete generative model in under 30 lines — no maths library required.

Worked Example: Markov Text Generator

Learn word-to-word transitions, then generate fresh sentences

Try it Yourself »

Python

import random

# A tiny GENERATIVE language model: a Markov chain.
# It learns "which word tends to follow which" from a corpus,
# then GENERATES new sentences by sampling that learned distribution.

random.seed(1)

corpus = """the cat sat on the mat
the dog sat on the log
the cat ran to the dog
the dog ran to the mat"""

# 1) LEARN: build a table of word -> list of words seen next
words = corpus.split()
chain = {}
for current, nxt in zip(words, words[1:]):
    chain.setdefault(current, []).appe
...

4The Four Families of Generative Models

Almost every modern generator belongs to one of four families. They differ in how they learn the distribution and sample from it.

📝 Autoregressive (GPT, PixelCNN, Markov chains)

Generate one element at a time, conditioning on what came before. Great for text; slower for long sequences because each step depends on the last.

🔄 VAEs — Variational Autoencoders

Squeeze data into a smooth latent space and decode it back. Stable to train and easy to interpolate, but outputs tend to be blurry.

⚔️ GANs — Generative Adversarial Networks

A generator forges data while a discriminator tries to catch fakes; they improve by competing. Produces sharp, realistic images but is notoriously tricky to train.

🌫️ Diffusion (DALL·E, Stable Diffusion)

Start from pure noise and denoise step by step until an image appears. Highest quality today; generation is slower because it takes many steps.

5Latent Space and Interpolation

Latent space is a compressed map of everything the model can create. Each point on the map is a recipe for one output; nearby points produce similar outputs. A face generator might use the same direction in latent space to mean "add glasses" or "make older".

Because the space is continuous, you can interpolate — walk in a straight line from the point for a cat to the point for a dog and decode every step along the way. The outputs morph smoothly from cat to dog. That smoothness is exactly why VAEs are loved for exploration, even when their images are a little soft.

Key insight: Sampling a model means picking a point in latent space and decoding it. Good latent spaces are organised so that "similar meaning" lives in "nearby coordinates".

6Evaluating Generative Models

Evaluation is genuinely hard: there is no single "correct" output to compare against. You instead balance two things — quality (do samples look real?) and diversity (do they cover the whole range of real data, or only a few favourites?).

The most common image metric is FID (Frechet Inception Distance). It compares the statistics of generated images to real images in a learned feature space — lower is better. FID rewards models that are both realistic and varied, which is why a model that produces one perfect image over and over still scores badly.

For text, you'll see perplexity and human preference scores; for everything, a human eye remains the final judge.

🎯 Your Turn 1: Weighted Sampling

Fill in the blanks so the sampler draws from a weather distribution. The skeleton is done — you only supply the weights and the comparison.

Your Turn: Weighted Sampling

Complete the distribution and the sampling comparison

Try it Yourself »

Python

import random

# 🎯 YOUR TURN — fill in the blanks marked with ___

random.seed(3)

# 1) A learned distribution must add up to 1.0 (100%).
#    Fill in the missing weight so sunny + rainy + cloudy = 1.0
weather = {
    "sunny":  0.6,
    "rainy":  0.1,
    "cloudy": ___,   # 👉 replace ___ so the three weights sum to 1.0
}

def sample(dist):
    r = random.random()
    cumulative = 0.0
    for outcome, weight in dist.items():
        cumulative += weight
        if r < ___:          # 👉 compare
...

🎯 Your Turn 2: Finish the Generator

The model has already learned its transition table. Finish the generation loop so it samples one new word at a time, like an autoregressive model.

Your Turn: Finish the Generator

Sample the next token and append it to the output

Try it Yourself »

Python

import random

# 🎯 YOUR TURN — fill in the blanks marked with ___

random.seed(2)

corpus = "i love ai and i love code and i build ai"
words = corpus.split()

# LEARN: word -> list of words seen next (already done for you)
chain = {}
for current, nxt in zip(words, words[1:]):
    chain.setdefault(current, []).append(nxt)

# GENERATE: sample the next word repeatedly
def generate(chain, start, length):
    word = start
    out = [word]
    for _ in range(length - 1):
        options = chain.get(w
...

🧗 Mini-Challenge: Your Own Tiny Language Model

Support is faded now — you get an outline only. Build a Markov generator from scratch over your own corpus.

Write a short multi-word corpus string (a few repeated words help).
Split it into words and build the word → next words table.
Generate a sentence by sampling the next word until you hit a length limit.

Mini-Challenge: Tiny Language Model

Learn a corpus and generate from it — no scaffolding

Try it Yourself »

Python

import random

# 🧗 MINI-CHALLENGE: Build a tiny language model from scratch
# 1. Make a corpus string with repeated words (so transitions vary)
# 2. Split into words and build a chain: {word: [words that followed]}
# 3. Write generate(start, length): sample the next word in a loop
# 4. Print one generated sentence
#
# ✅ Expected: a short sentence whose word-to-word jumps all appear
#    somewhere in your corpus (it should "sound like" your text).

random.seed(0)

# your code here

Common Errors and Failure Modes

❌ Mode collapse (GANs)

The generator finds one output that always fools the discriminator and produces only that — endless near-identical faces. Diversity drops to almost nothing.

✅ Fix: add minibatch discrimination, use a Wasserstein loss (WGAN-GP), or unroll the discriminator so the generator can't exploit a single weakness.

❌ Blurry outputs (VAEs)

VAEs minimise average reconstruction error, so when several outputs are plausible they hedge by averaging them — and the average of sharp images is blurry.

✅ Fix: use a perceptual loss, a VAE-GAN hybrid, or a discrete latent (VQ-VAE) to commit to specific outputs instead of averaging.

❌ Training instability

Adversarial losses oscillate or diverge — the generator and discriminator chase each other instead of converging, and the loss curve never settles.

✅ Fix: balance the learning rates (two-timescale rule), add spectral normalisation, and keep the discriminator from becoming too strong too fast.

❌ "It looks fine but the number says it's bad" — evaluation difficulty

There's no single correct output, so loss going down doesn't guarantee good samples. A model can score a low loss while still producing repetitive or unrealistic results.

✅ Fix: report FID (lower is better) and a diversity check, then back it up with human evaluation — never trust one metric alone.

📋 Quick Reference: The Four Families

Family	How it generates	Strength	Weakness
Autoregressive	One token at a time	Excellent for text	Slow for long sequences
VAE	Decode a latent point	Smooth, stable training	Blurry outputs
GAN	Generator vs discriminator	Sharp, realistic	Unstable, mode collapse
Diffusion	Denoise from noise	Best quality today	Slow, many steps

❓ Frequently Asked Questions

Q: What is the difference between a discriminative and a generative model?

A: A discriminative model learns the boundary between classes — given an input, it predicts a label (P(label | data)). A generative model learns what the data itself looks like (P(data)), so it can sample brand-new examples. Classifiers are discriminative; image generators, VAEs, GANs, and diffusion models are generative.

Q: What are the four main families of generative models?

A: Autoregressive models (generate one token at a time, like GPT and Markov chains), Variational Autoencoders or VAEs (learn a smooth latent space), GANs (a generator and a discriminator compete), and diffusion models (start from noise and gradually denoise). Diffusion currently leads for image quality.

Q: What is latent space?

A: Latent space is a compressed, lower-dimensional set of coordinates where each point represents a possible output. Nearby points produce similar results, so you can interpolate — walk smoothly from a cat to a dog by moving between their latent coordinates. Sampling a generative model means picking a point in latent space and decoding it.

Q: Why are VAE outputs often blurry while GAN outputs are sharp?

A: VAEs are trained to minimise average reconstruction error, so when several plausible outputs exist they hedge by averaging them — and the average of many sharp images is a blurry one. GANs are trained by a discriminator that rejects anything unrealistic, which pushes the generator toward sharp, specific images instead of safe averages.

Q: What is FID and how do you evaluate a generative model?

A: FID (Frechet Inception Distance) compares the statistics of generated images to real ones in a feature space — lower is better. Evaluation is hard because there is no single right answer: you weigh sample quality (do outputs look real?) against diversity (do they cover the whole data distribution?), often combining FID with human judgement.

🎉 Lesson Complete!

You can now tell discriminative from generative models, name the four families, explain latent space and interpolation, sample from a learned distribution, and read evaluation metrics like FID. Best of all, you built a working autoregressive text generator in plain Python.

🚀 Up next: Diffusion Models — the noise-to-image technique behind DALL·E and Stable Diffusion, and today's quality leader.

Generative Models: Autoencoders, VAEs & GANs

What You'll Learn in This Lesson

🎨 Real-World Analogy: The Artist Who Learns a Style

1Discriminative vs Generative

2Sampling From a Learned Distribution

Worked Example: Sample a Distribution

3Autoregressive Models: One Token at a Time

Worked Example: Markov Text Generator

4The Four Families of Generative Models

5Latent Space and Interpolation

6Evaluating Generative Models

🎯 Your Turn 1: Weighted Sampling

Your Turn: Weighted Sampling

🎯 Your Turn 2: Finish the Generator

Your Turn: Finish the Generator

🧗 Mini-Challenge: Your Own Tiny Language Model

Mini-Challenge: Tiny Language Model

Common Errors and Failure Modes

📋 Quick Reference: The Four Families

❓ Frequently Asked Questions

🎉 Lesson Complete!

Cookie & Privacy Settings