Lesson 29 • Advanced

    Q-Learning & Deep Q-Networks

    Implement Q-Learning from scratch, then scale it with neural networks — the algorithm that taught AI to play Atari games at superhuman level.

    ✅ What You'll Learn

    • • Q-tables: storing action values for every state-action pair
    • • The Q-learning update rule (off-policy temporal difference)
    • • Deep Q-Networks: replacing tables with neural networks
    • • Experience replay and target networks for stability

    🧮 From Tables to Neural Networks

    🎯 Real-World Analogy: Q-Learning with a table is like a tourist with a notebook — they write down "if I'm at the Eiffel Tower and go east, I'll find a great café (reward: 9/10)." This works for a small city, but for a country with millions of locations, you need a GPS that generalises — that's DQN. The neural network learns patterns ("restaurants near landmarks tend to be good") instead of memorising every location.

    Q-Learning (1989) stores a value for every state-action pair in a table. DQN (2013) replaced this table with a neural network, enabling RL to work on problems with millions of states — like raw pixel inputs from Atari games.

    Try It: Q-Learning

    Train an agent to navigate a 4×4 grid world and find the optimal path

    Try it Yourself »
    Python
    import numpy as np
    
    # Q-Learning: Learn action values without a model of the environment
    # Q(s,a) = expected total reward for taking action a in state s
    
    np.random.seed(42)
    
    # Grid World: 4x4 grid, goal at (3,3), trap at (1,2)
    grid_size = 4
    n_states = grid_size * grid_size
    n_actions = 4  # Up, Down, Left, Right
    actions = ["Up", "Down", "Left", "Right"]
    
    # Initialize Q-table
    Q = np.zeros((n_states, n_actions))
    
    # Rewards
    goal_state = 15  # (3,3)
    trap_state = 6   # (1,2)
    
    def get_reward(state):
      
    ...

    Try It: Deep Q-Network

    See how experience replay and neural networks scale Q-learning

    Try it Yourself »
    Python
    import numpy as np
    
    # Deep Q-Network (DQN): Q-Learning with Neural Networks
    # When the state space is too large for a table, use a neural net
    
    np.random.seed(42)
    
    class SimpleNN:
        """Tiny neural network to approximate Q-values"""
        def __init__(self, input_dim, hidden_dim, output_dim):
            self.W1 = np.random.randn(input_dim, hidden_dim) * 0.3
            self.b1 = np.zeros(hidden_dim)
            self.W2 = np.random.randn(hidden_dim, output_dim) * 0.3
            self.b2 = np.zeros(output_dim)
       
    ...

    ⚠️ Common Mistake: Not using a target network. Without it, Q-value estimates chase a moving target and training oscillates wildly. Update the target network every 1000-10000 steps by copying the main network's weights.

    💡 Pro Tip: DQN has been superseded by Double DQN (fixes overestimation), Dueling DQN (separates state value from action advantage), and Rainbow (combines all improvements). For continuous action spaces, use PPO or SAC instead — DQN only works for discrete actions.

    📋 Quick Reference

    MethodState SpaceAction SpaceKey Feature
    Q-TableSmall, discreteDiscreteSimple, exact
    DQNLarge (pixels)DiscreteNeural net approximation
    Double DQNLargeDiscreteFixes overestimation
    Dueling DQNLargeDiscreteSeparates V(s) and A(s,a)
    RainbowLargeDiscreteAll improvements combined

    🎉 Lesson Complete!

    You've mastered Q-Learning and DQN! Next, learn Policy Gradient methods — the algorithms behind PPO and ChatGPT's RLHF training.

    Sign up for free to track which lessons you've completed and get learning reminders.

    Previous

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy PolicyTerms of Service