Skip to main content
    Courses/AI & ML/Deep Learning Fundamentals

    Lesson 8 • Intermediate

    Deep Learning Fundamentals

    By the end of this lesson you'll be able to explain how a deep network turns numbers into predictions, how it learns from its mistakes with backpropagation and gradient descent, and how to build one in Keras without getting lost.

    What You'll Learn in This Lesson

    • What makes a network "deep" — stacking many hidden layers
    • Run a forward pass by hand through a tiny 2-layer network
    • The intuition behind backpropagation and gradient descent
    • How loss functions measure "how wrong" a model is
    • What epochs, batches, and the learning rate actually control
    • Build a small Keras model and fight overfitting with dropout

    🎯 Real-World Analogy: A Factory Assembly Line

    Picture a factory assembly line. Raw materials enter at one end; each station does one small job and passes the result on. By the last station, scattered parts have become a finished car. A deep network works the same way — each layer is a station that refines the data a little before passing it along.

    The first layers detect simple things (edges in an image, fragments of a word). Middle layers combine those into shapes or phrases. The final layer makes the call: "that's a golden retriever" or "this review is positive." Deep just means many of these stations stacked in a row — and more stations means the network can learn more abstract ideas.

    1What Makes a Network "Deep"?

    A layer is a row of neurons. A network with one or two layers is called shallow; a network with many layers is deep. The data flows forward: the output of one layer becomes the input to the next. This step — running data through every layer to get a prediction — is called a forward pass.

    Two breakthroughs made deep networks practical: the ReLU activation (which keeps positive numbers and zeroes out negatives) fixed a problem where gradients vanished in deep stacks, and GPUs made it fast enough to train millions of weights. Below, you'll do a forward pass entirely by hand so the layers stop being a mystery.

    Read every comment in this worked example, then run it. Each hidden neuron is just a weighted sum of the inputs, plus a bias, passed through an activation.

    Worked Example: A 2-Layer Forward Pass (Plain Python)

    Run data through two layers by hand and watch a prediction appear

    Try it Yourself »
    Python
    # A tiny "deep" network by hand: 2 inputs -> 2 hidden neurons -> 1 output
    # No frameworks. Just lists and arithmetic so you can SEE what a layer does.
    
    import math
    
    def relu(x):
        return x if x > 0 else 0.0          # ReLU: keep positives, zero the rest
    
    def sigmoid(x):
        return 1 / (1 + math.exp(-x))       # squashes any number into 0..1
    
    # --- The input the network "sees" ---
    inputs = [0.5, 0.9]                      # 2 features (e.g. brightness, size)
    
    # --- Layer 1: 2 inputs -> 2 hidden 
    ...

    🎯 Your Turn: Finish a Single Neuron

    Fill in the blanks to complete one neuron's weighted sum and activation

    Try it Yourself »
    Python
    # 🎯 YOUR TURN — finish this single neuron's forward pass.
    # Fill in the blanks marked ___ . One neuron = weighted sum + bias + activation.
    
    import math
    
    def relu(x):
        return x if x > 0 else 0.0
    
    inputs  = [1.0, 2.0, 3.0]          # 3 features
    weights = [0.5, -0.2, 0.1]         # one weight per feature
    bias    = 0.4
    
    total = bias
    for i in range(3):
        # 👉 add each input multiplied by its matching weight
        total += inputs[i] * ___        # 👉 replace ___ with weights[i]
    
    # 👉 pass the weig
    ...

    2How Networks Learn: Backprop & Gradient Descent

    A fresh network guesses badly. Learning is the process of nudging every weight so the guesses get better. There are two parts:

    • Backpropagation answers "which weights are to blame, and in which direction?" It sends the error backwards through the layers (using the chain rule from calculus) to compute a gradient — the slope of the loss — for every weight.
    • Gradient descent uses those gradients to take a step. Each weight moves a little in the direction that lowers the loss. Repeat thousands of times and the network gets good.

    The clearest way to feel gradient descent is on a single number. Below, you minimise f(x) = (x - 3)². You already know the answer is x = 3 — watch the algorithm discover it by always stepping downhill, opposite the gradient.

    Worked Example: One Function, Many Gradient-Descent Steps

    Watch x walk downhill toward the minimum, step by step

    Try it Yourself »
    Python
    # Gradient descent: the rule that lets a network LEARN.
    # Goal: find the x that makes f(x) = (x - 3)**2 as small as possible.
    # The minimum is obviously x = 3 — watch the algorithm walk towards it.
    
    def f(x):
        return (x - 3) ** 2                  # the "loss" we want to minimise
    
    def gradient(x):
        return 2 * (x - 3)                   # slope of f at x (calculus gives this)
    
    x = 0.0                                  # a bad starting guess
    learning_rate = 0.1                      # how big ea
    ...

    🎯 Your Turn: Take One Step Downhill

    Apply the gradient-descent update rule for a single step

    Try it Yourself »
    Python
    # 🎯 YOUR TURN — take ONE gradient-descent step by hand.
    # Same function as before: f(x) = (x - 3)**2, gradient = 2*(x - 3).
    
    def gradient(x):
        return 2 * (x - 3)
    
    x = 5.0                  # current guess (too high — the minimum is at 3)
    learning_rate = 0.1
    
    g = gradient(x)          # slope at x = 5  ->  4.0
    # 👉 move x DOWNHILL: subtract learning_rate * gradient
    new_x = x - ___ * g      # 👉 replace ___ with learning_rate
    
    print("gradient:", g)
    print("new x:", round(new_x, 3))
    
    # ✅ Expected 
    ...

    3Loss Functions: Measuring "How Wrong"

    Before a network can improve, it needs a single number that says how bad its current guesses are. That number is the loss. Gradient descent's whole job is to make this number smaller.

    Different tasks use different loss functions:

    • Mean Squared Error (MSE) — for predicting numbers (regression). It squares the gap between prediction and truth, so big mistakes hurt a lot.
    • Cross-entropy — for classification (cat vs dog, spam vs not). It heavily punishes being confidently wrong.
    # Loss = one number that says "how wrong are we?"
    predictions = [2.8, 5.3, 6.5]
    targets     = [3.0, 5.0, 7.0]
    
    # Mean Squared Error: average of the squared gaps
    errors  = [(p - t) for p, t in zip(predictions, targets)]
    squared = [e * e for e in errors]
    mse     = sum(squared) / len(squared)
    
    print("errors:", [round(e, 2) for e in errors])  # [-0.2, 0.3, -0.5]
    print("MSE:", round(mse, 4))                       # 0.1267
    # Lower MSE = better predictions. Training drives this toward 0.

    4Epochs, Batches, and the Learning Rate

    These three dials control how training runs. They trip up beginners, so here they are in plain English:

    Epoch

    One full pass over ALL your training data. 10 epochs = the network sees every example 10 times.

    Batch

    A small chunk (e.g. 32 samples) used for one weight update. Smaller batches = more frequent, noisier updates.

    Learning rate

    How big each weight step is. Too high overshoots; too low crawls. A common default is 0.001.

    5Frameworks & Overfitting (TensorFlow/Keras, PyTorch)

    You'd never hand-code backprop for a real model. Frameworks do the calculus and GPU work for you. The two most popular are TensorFlow/Keras (a model is a short list of layers — the gentlest start) and PyTorch (flexible and Pythonic, favoured in research). They teach the same ideas you just learned.

    A big risk with deep models is overfitting: the network memorises the training data and then flops on new data. Regularisation fights this. The most common trick is dropout — during training it randomly switches off a fraction of neurons, forcing the network to spread its knowledge out instead of relying on a few memorised paths.

    The Keras model below uses the same layers, ReLU, sigmoid, Adam optimiser, and binary cross-entropy loss from this lesson — plus one Dropout layer. Run it where TensorFlow is installed; here the expected summary is shown so you can read it as a reference.

    Worked Example: A Small Keras Model (with Dropout)

    The same concepts in a real framework — read the expected summary

    Try it Yourself »
    Python
    # A real deep network in Keras — the SAME ideas, just fewer lines.
    # (Run this where TensorFlow is installed; here it is a worked reference.)
    
    import tensorflow as tf
    from tensorflow import keras
    
    # Build: 2 inputs -> 16 hidden (ReLU) -> 8 hidden (ReLU) -> 1 output (sigmoid)
    model = keras.Sequential([
        keras.layers.Dense(16, activation="relu", input_shape=(2,)),
        keras.layers.Dropout(0.2),               # regularisation — fights overfitting
        keras.layers.Dense(8, activation="relu"),
       
    ...

    6Common Errors (And How to Fix Them)

    These four problems trip up almost every beginner. Here's how to spot and fix them:

    🔥 Learning rate too high

    The loss jumps around, balloons to a giant number, or becomes nan. The steps are so big that gradient descent overshoots the minimum every time.

    ✅ Fix: lower the learning rate (try 0.001, then divide by 10 if it still diverges).

    📏 No input normalisation

    One feature ranges 0–1 and another ranges 0–100,000. Training is unstable or painfully slow because the large feature dominates every gradient.

    ✅ Fix: scale features to a similar range (e.g. subtract the mean and divide by the standard deviation) before training.

    🧠 Overfitting with no regularisation

    Training accuracy hits 99% but new-data accuracy is poor. The model memorised the training set instead of the pattern.

    ✅ Fix: add Dropout, get more data, or stop training earlier (early stopping).

    📉 Too little data

    With only a handful of examples, a deep network has nothing to generalise from and overfits instantly — it can't tell signal from noise.

    ✅ Fix: gather more data, use data augmentation, or pick a smaller/simpler model that needs fewer examples.

    📋 Quick Reference

    TermWhat It MeansTypical Choice
    Forward passRun data through every layer to get a prediction
    BackpropagationSend error backwards to compute each weight's gradientAutomatic in frameworks
    Loss (regression)Measures error on number predictionsMSE
    Loss (classification)Measures error on category predictionsCross-entropy
    Learning rateStep size of each weight update0.001 (Adam)
    Epoch / BatchOne full data pass / one update chunk10–100 / 32
    Activation (hidden)Adds non-linearity between layersReLU
    RegularisationFights overfittingDropout (0.2–0.5)

    ❓ Frequently Asked Questions

    Q: What makes a neural network "deep"?

    A: Depth means stacking many hidden layers between the input and output. Each layer transforms the previous layer's output, so early layers learn simple patterns (edges, word fragments) and later layers combine them into complex concepts (faces, meaning). One or two layers is "shallow"; many layers is "deep".

    Q: What is backpropagation in plain English?

    A: Backpropagation is how the network figures out which weights to blame for its mistakes. After a forward pass produces an answer, the error is sent backwards layer by layer using the chain rule from calculus, giving each weight a gradient. Gradient descent then nudges every weight in the direction that lowers the loss.

    Q: What's the difference between an epoch, a batch, and the learning rate?

    A: An epoch is one full pass over all your training data. A batch is a small chunk of that data the network updates from at a time (e.g. 32 samples). The learning rate controls how big each weight update is — too high and training diverges, too low and it crawls.

    Q: Should I use TensorFlow/Keras or PyTorch?

    A: Both are excellent and learn the same concepts. Keras (on top of TensorFlow) is the gentlest start — a model is a short list of layers. PyTorch is favoured in research for its flexible, Pythonic feel. Pick one, learn the ideas, and the other becomes easy.

    Q: What is overfitting and how does dropout help?

    A: Overfitting is when a model memorises the training data instead of learning the general pattern, so it does well on data it has seen but poorly on new data. Dropout randomly switches off a fraction of neurons during training, forcing the network to spread its learning out rather than rely on a few memorised paths.

    🎯 Mini-Challenge: Gradient Descent in a Loop

    You've taken one step by hand and watched a worked loop. Now write the loop yourself from the outline below — no filled-in logic this time, just the plan.

    Mini-Challenge

    Minimise f(x) = (x - 10)² with your own gradient-descent loop

    Try it Yourself »
    Python
    # 🎯 MINI-CHALLENGE: run gradient descent in a loop.
    # Minimise f(x) = (x - 10)**2 . The answer should march toward x = 10.
    #
    # 1. Define gradient(x) that returns 2 * (x - 10)
    # 2. Start x at 0.0 and set learning_rate to 0.1
    # 3. Loop 20 times: each pass, do  x = x - learning_rate * gradient(x)
    # 4. After the loop, print the final x rounded to 2 places
    #
    # ✅ Expected (roughly): final x is close to 8.84 after 20 steps
    #    (more steps -> even closer to 10)
    
    # your code here
    🎉

    Lesson 8 complete — you understand deep learning's engine!

    You can run a forward pass by hand, explain backpropagation and gradient descent, choose the right loss function, set epochs/batches/learning rate sensibly, build a small Keras model, and fight overfitting with dropout. That's the core of every deep learning system in the world.

    🚀 Up next: Natural Language Processing — teaching computers to understand and generate human text.

    Sign up for free to track which lessons you've completed and get learning reminders.

    Previous

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy PolicyTerms of Service

    Install LearnCodingFast

    Learn faster with the app on your home screen.