Lesson 8 • Intermediate
Deep Learning Fundamentals
Master backpropagation, loss functions, and optimisers — the engine that powers all deep learning.
✅ What You'll Learn
- • Backpropagation through a 3-layer network
- • Loss functions: MSE, MAE, cross-entropy
- • Optimisers: SGD, Momentum, Adam
- • How deep learning differs from shallow ML
🔗 What Makes Deep Learning "Deep"?
🎯 Real-World Analogy: Traditional ML is like a camera with one lens — it captures the scene in one shot. Deep learning is like a camera with many stacked lenses, each refining the image. The first lens detects edges, the next detects shapes, the next detects objects, and the final lens identifies "that's a golden retriever." More layers = more abstraction.
"Deep" simply means many hidden layers. Two breakthroughs made deep networks work: (1) ReLU activation solved the vanishing gradient problem, and (2) GPUs made training millions of parameters fast enough to be practical.
Try It: Backpropagation
Train a 3-layer network and watch error flow backwards
import numpy as np
# Backpropagation: How deep networks learn
# Error flows backwards through the network, updating every weight
def sigmoid(x):
return 1 / (1 + np.exp(-np.clip(x, -500, 500)))
# 3-layer network: 2 → 3 → 2 → 1
np.random.seed(42)
# Training data: predict if sum > 1
X = np.array([[0.1, 0.9], [0.8, 0.4], [0.3, 0.2], [0.9, 0.8],
[0.2, 0.5], [0.7, 0.6], [0.1, 0.1], [0.6, 0.9]])
y = np.array([[1], [1], [0], [1], [0], [1], [0], [1]])
# Initialise weights
W1 = np.r
...Try It: Loss Functions
Compare MSE and cross-entropy — see how confidence affects loss
import numpy as np
# Loss Functions: How we measure "how wrong" a model is
# Different tasks need different loss functions
def sigmoid(x):
return 1 / (1 + np.exp(-np.clip(x, -500, 500)))
# === MSE Loss (Regression) ===
actual_reg = np.array([3.0, 5.0, 7.0, 9.0])
pred_reg = np.array([2.8, 5.3, 6.5, 9.2])
mse = np.mean((actual_reg - pred_reg) ** 2)
mae = np.mean(np.abs(actual_reg - pred_reg))
print("=== Regression Losses ===")
print(f"MSE: {mse:.4f} (penalises big errors)")
print(f"MAE: {m
...Try It: Optimiser Comparison
Race SGD, Momentum, and Adam to find the minimum
import numpy as np
# Optimizers: Different ways to update weights
# Gradient descent isn't the only option!
np.random.seed(42)
# Simple 1D optimisation: find minimum of f(x) = (x - 3)^2
def f(x): return (x - 3) ** 2
def grad_f(x): return 2 * (x - 3)
x_sgd = 0.0
x_momentum = 0.0
v_momentum = 0.0
x_adam = 0.0
m_adam = 0.0
v_adam = 0.0
lr = 0.1
beta1, beta2 = 0.9, 0.999
eps = 1e-8
print("=== Optimizer Comparison (finding x where (x-3)²=0) ===")
print(f"{'Step':>4} {'SGD':>8} {'Momentum':>10}
...📋 Quick Reference
| Component | Options | Default Choice |
|---|---|---|
| Loss (regression) | MSE, MAE, Huber | MSE |
| Loss (classification) | Cross-entropy | Binary/Categorical CE |
| Optimiser | SGD, Adam, AdamW | Adam (lr=0.001) |
| Activation (hidden) | ReLU, GELU, SiLU | ReLU |
| Activation (output) | Sigmoid, Softmax | Task-dependent |
🎉 Lesson Complete!
You understand how deep learning works under the hood! Next, learn Natural Language Processing — teaching computers to understand text.
Sign up for free to track which lessons you've completed and get learning reminders.