Skip to main content

    Lesson 23 • Advanced

    Generative Models: Autoencoders, VAEs & GANs

    By the end of this lesson you'll understand how machines create brand-new data — and you'll have written a working text generator that samples a distribution it learned itself.

    What You'll Learn in This Lesson

    • Tell discriminative models apart from generative ones
    • Name the 4 families: autoregressive, VAE, GAN, diffusion
    • Explain latent space and why interpolation is smooth
    • Sample new data from a learned probability distribution
    • Build a Markov-chain text generator in plain Python
    • Read evaluation metrics like FID and judge sample quality

    🎨 Real-World Analogy: The Artist Who Learns a Style

    Picture an artist who studies hundreds of Van Gogh paintings. They never copy a single one — instead they absorb the style: the swirling skies, the thick brushstrokes, the colour choices. Afterwards they can paint a brand-new scene that has never existed, yet unmistakably looks like a Van Gogh.

    That is exactly what a generative model does. During training it watches a mountain of real examples and learns the underlying data distribution — the statistical "style" of the data. During generation it samples from that learned distribution to produce something new that fits the style but isn't a copy.

    Keep this picture in mind: learn a style → create new work. Everything below is just different machinery for doing those two steps.

    1Discriminative vs Generative

    Most models you've met so far are discriminative: you hand them data and they predict a label. A spam filter learns the line between "spam" and "not spam". In maths, it estimates P(label | data) — the probability of a label given the input.

    A generative model is more ambitious. Instead of drawing a boundary, it learns what the data itself looks like — P(data). Because it understands the whole shape of the data, it can sample brand-new examples: a new sentence, a new face, a new song.

    Discriminative asks:

    "Is this email spam?" — it sorts existing things into bins.

    Generative asks:

    "What does a typical email look like?" — it can write a new one.

    2Sampling From a Learned Distribution

    At its heart, generation = sampling. A generative model is really just a probability distribution you can draw from. Run the worked example below: it samples colours whose probabilities were "learned", then checks that the samples echo those probabilities.

    Read every comment, then hit run. Notice the output counts roughly match the learned weights — that's the model reproducing the style of its data.

    Worked Example: Sample a Distribution

    Draw outcomes whose frequencies match learned probabilities

    Try it Yourself »
    Python
    import random
    
    # A generative model's core skill: SAMPLE from a probability distribution.
    # Here the "distribution" is the colour mix of marbles in a learned bag.
    # Generating = reaching in and pulling one out at random.
    
    random.seed(7)  # makes the run repeatable so you can compare outputs
    
    # A learned distribution: each colour has a weight (how common it is)
    distribution = {
        "red":    0.5,   # 50% of the probability mass
        "blue":   0.3,   # 30%
        "green":  0.2,   # 20%
    }
    
    def sample(d
    ...

    3Autoregressive Models: One Token at a Time

    An autoregressive model generates a sequence by predicting the next item from everything before it, then feeding that prediction back in. GPT writes a sentence one token at a time this way. The simplest version is a Markov chain: it only looks at the current word to guess the next.

    The worked example below learns which words follow which from a four-line corpus, then generates new sentences by repeatedly sampling. It is a complete generative model in under 30 lines — no maths library required.

    Worked Example: Markov Text Generator

    Learn word-to-word transitions, then generate fresh sentences

    Try it Yourself »
    Python
    import random
    
    # A tiny GENERATIVE language model: a Markov chain.
    # It learns "which word tends to follow which" from a corpus,
    # then GENERATES new sentences by sampling that learned distribution.
    
    random.seed(1)
    
    corpus = """the cat sat on the mat
    the dog sat on the log
    the cat ran to the dog
    the dog ran to the mat"""
    
    # 1) LEARN: build a table of word -> list of words seen next
    words = corpus.split()
    chain = {}
    for current, nxt in zip(words, words[1:]):
        chain.setdefault(current, []).appe
    ...

    4The Four Families of Generative Models

    Almost every modern generator belongs to one of four families. They differ in how they learn the distribution and sample from it.

    📝 Autoregressive (GPT, PixelCNN, Markov chains)

    Generate one element at a time, conditioning on what came before. Great for text; slower for long sequences because each step depends on the last.

    🔄 VAEs — Variational Autoencoders

    Squeeze data into a smooth latent space and decode it back. Stable to train and easy to interpolate, but outputs tend to be blurry.

    ⚔️ GANs — Generative Adversarial Networks

    A generator forges data while a discriminator tries to catch fakes; they improve by competing. Produces sharp, realistic images but is notoriously tricky to train.

    🌫️ Diffusion (DALL·E, Stable Diffusion)

    Start from pure noise and denoise step by step until an image appears. Highest quality today; generation is slower because it takes many steps.

    5Latent Space and Interpolation

    Latent space is a compressed map of everything the model can create. Each point on the map is a recipe for one output; nearby points produce similar outputs. A face generator might use the same direction in latent space to mean "add glasses" or "make older".

    Because the space is continuous, you can interpolate — walk in a straight line from the point for a cat to the point for a dog and decode every step along the way. The outputs morph smoothly from cat to dog. That smoothness is exactly why VAEs are loved for exploration, even when their images are a little soft.

    6Evaluating Generative Models

    Evaluation is genuinely hard: there is no single "correct" output to compare against. You instead balance two things — quality (do samples look real?) and diversity (do they cover the whole range of real data, or only a few favourites?).

    The most common image metric is FID (Frechet Inception Distance). It compares the statistics of generated images to real images in a learned feature space — lower is better. FID rewards models that are both realistic and varied, which is why a model that produces one perfect image over and over still scores badly.

    For text, you'll see perplexity and human preference scores; for everything, a human eye remains the final judge.

    🎯 Your Turn 1: Weighted Sampling

    Fill in the blanks so the sampler draws from a weather distribution. The skeleton is done — you only supply the weights and the comparison.

    Your Turn: Weighted Sampling

    Complete the distribution and the sampling comparison

    Try it Yourself »
    Python
    import random
    
    # 🎯 YOUR TURN — fill in the blanks marked with ___
    
    random.seed(3)
    
    # 1) A learned distribution must add up to 1.0 (100%).
    #    Fill in the missing weight so sunny + rainy + cloudy = 1.0
    weather = {
        "sunny":  0.6,
        "rainy":  0.1,
        "cloudy": ___,   # 👉 replace ___ so the three weights sum to 1.0
    }
    
    def sample(dist):
        r = random.random()
        cumulative = 0.0
        for outcome, weight in dist.items():
            cumulative += weight
            if r < ___:          # 👉 compare
    ...

    🎯 Your Turn 2: Finish the Generator

    The model has already learned its transition table. Finish the generation loop so it samples one new word at a time, like an autoregressive model.

    Your Turn: Finish the Generator

    Sample the next token and append it to the output

    Try it Yourself »
    Python
    import random
    
    # 🎯 YOUR TURN — fill in the blanks marked with ___
    
    random.seed(2)
    
    corpus = "i love ai and i love code and i build ai"
    words = corpus.split()
    
    # LEARN: word -> list of words seen next (already done for you)
    chain = {}
    for current, nxt in zip(words, words[1:]):
        chain.setdefault(current, []).append(nxt)
    
    # GENERATE: sample the next word repeatedly
    def generate(chain, start, length):
        word = start
        out = [word]
        for _ in range(length - 1):
            options = chain.get(w
    ...

    🧗 Mini-Challenge: Your Own Tiny Language Model

    Support is faded now — you get an outline only. Build a Markov generator from scratch over your own corpus.

    1. Write a short multi-word corpus string (a few repeated words help).
    2. Split it into words and build the word → next words table.
    3. Generate a sentence by sampling the next word until you hit a length limit.

    Mini-Challenge: Tiny Language Model

    Learn a corpus and generate from it — no scaffolding

    Try it Yourself »
    Python
    import random
    
    # 🧗 MINI-CHALLENGE: Build a tiny language model from scratch
    # 1. Make a corpus string with repeated words (so transitions vary)
    # 2. Split into words and build a chain: {word: [words that followed]}
    # 3. Write generate(start, length): sample the next word in a loop
    # 4. Print one generated sentence
    #
    # ✅ Expected: a short sentence whose word-to-word jumps all appear
    #    somewhere in your corpus (it should "sound like" your text).
    
    random.seed(0)
    
    # your code here

    Common Errors and Failure Modes

    ❌ Mode collapse (GANs)

    The generator finds one output that always fools the discriminator and produces only that — endless near-identical faces. Diversity drops to almost nothing.

    ✅ Fix: add minibatch discrimination, use a Wasserstein loss (WGAN-GP), or unroll the discriminator so the generator can't exploit a single weakness.

    ❌ Blurry outputs (VAEs)

    VAEs minimise average reconstruction error, so when several outputs are plausible they hedge by averaging them — and the average of sharp images is blurry.

    ✅ Fix: use a perceptual loss, a VAE-GAN hybrid, or a discrete latent (VQ-VAE) to commit to specific outputs instead of averaging.

    ❌ Training instability

    Adversarial losses oscillate or diverge — the generator and discriminator chase each other instead of converging, and the loss curve never settles.

    ✅ Fix: balance the learning rates (two-timescale rule), add spectral normalisation, and keep the discriminator from becoming too strong too fast.

    ❌ "It looks fine but the number says it's bad" — evaluation difficulty

    There's no single correct output, so loss going down doesn't guarantee good samples. A model can score a low loss while still producing repetitive or unrealistic results.

    ✅ Fix: report FID (lower is better) and a diversity check, then back it up with human evaluation — never trust one metric alone.

    📋 Quick Reference: The Four Families

    FamilyHow it generatesStrengthWeakness
    AutoregressiveOne token at a timeExcellent for textSlow for long sequences
    VAEDecode a latent pointSmooth, stable trainingBlurry outputs
    GANGenerator vs discriminatorSharp, realisticUnstable, mode collapse
    DiffusionDenoise from noiseBest quality todaySlow, many steps

    ❓ Frequently Asked Questions

    Q: What is the difference between a discriminative and a generative model?

    A: A discriminative model learns the boundary between classes — given an input, it predicts a label (P(label | data)). A generative model learns what the data itself looks like (P(data)), so it can sample brand-new examples. Classifiers are discriminative; image generators, VAEs, GANs, and diffusion models are generative.

    Q: What are the four main families of generative models?

    A: Autoregressive models (generate one token at a time, like GPT and Markov chains), Variational Autoencoders or VAEs (learn a smooth latent space), GANs (a generator and a discriminator compete), and diffusion models (start from noise and gradually denoise). Diffusion currently leads for image quality.

    Q: What is latent space?

    A: Latent space is a compressed, lower-dimensional set of coordinates where each point represents a possible output. Nearby points produce similar results, so you can interpolate — walk smoothly from a cat to a dog by moving between their latent coordinates. Sampling a generative model means picking a point in latent space and decoding it.

    Q: Why are VAE outputs often blurry while GAN outputs are sharp?

    A: VAEs are trained to minimise average reconstruction error, so when several plausible outputs exist they hedge by averaging them — and the average of many sharp images is a blurry one. GANs are trained by a discriminator that rejects anything unrealistic, which pushes the generator toward sharp, specific images instead of safe averages.

    Q: What is FID and how do you evaluate a generative model?

    A: FID (Frechet Inception Distance) compares the statistics of generated images to real ones in a feature space — lower is better. Evaluation is hard because there is no single right answer: you weigh sample quality (do outputs look real?) against diversity (do they cover the whole data distribution?), often combining FID with human judgement.

    🎉 Lesson Complete!

    You can now tell discriminative from generative models, name the four families, explain latent space and interpolation, sample from a learned distribution, and read evaluation metrics like FID. Best of all, you built a working autoregressive text generator in plain Python.

    🚀 Up next: Diffusion Models — the noise-to-image technique behind DALL·E and Stable Diffusion, and today's quality leader.

    Sign up for free to track which lessons you've completed and get learning reminders.

    Previous

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy PolicyTerms of Service