Lesson 10 • Intermediate
Computer Vision Basics
Teach computers to see — understand how images become numbers and how CNNs detect objects.
✅ What You'll Learn
- • How images are represented as numerical arrays
- • Convolution: how filters detect edges and patterns
- • CNN architecture: Conv → Pool → Dense → Output
- • Building a complete CNN pipeline for image classification
👁️ How Computers "See"
🎯 Real-World Analogy: When you see a dog, your brain processes the image in layers: first you detect edges (outlines), then shapes (ears, nose), then textures (fur), then the whole object (dog!). CNNs work the same way — each layer detects increasingly complex features.
For a computer, an image is just a grid of numbers. A 224×224 colour photo is 224 × 224 × 3 = 150,528 numbers. The challenge is turning those numbers into the answer: "that's a golden retriever."
⚠️ Common Mistake: Feeding raw pixels into a regular neural network. A 224×224 image would need 150,528 input neurons — way too many parameters. CNNs use weight sharing through convolution to drastically reduce parameters.
Try It: Images as Numbers
See how grayscale and RGB images are stored as arrays
import numpy as np
# Images are just numbers! Each pixel has a value.
# Grayscale: one number per pixel (0=black, 255=white)
grayscale_img = np.array([
[0, 0, 0, 0, 0],
[0, 255, 255, 255, 0],
[0, 255, 0, 255, 0],
[0, 255, 255, 255, 0],
[0, 0, 0, 0, 0],
])
print("=== Grayscale Image (5×5 smiley face border) ===")
for row in grayscale_img:
line = ""
for pixel in row:
if pixel > 200: line += "██"
elif pixel > 100: line += "▓▓"
...Try It: Convolution & Edge Detection
Apply edge detection filters to an image matrix
import numpy as np
# Convolution: How CNNs detect features in images
# A small filter slides across the image, detecting patterns
# 5×5 grayscale image
image = np.array([
[0, 0, 0, 0, 0],
[0, 1, 1, 1, 0],
[0, 1, 0, 1, 0],
[0, 1, 1, 1, 0],
[0, 0, 0, 0, 0],
], dtype=float)
# Edge detection filter (3×3)
edge_filter = np.array([
[-1, -1, -1],
[-1, 8, -1],
[-1, -1, -1],
], dtype=float)
# Horizontal edge filter
horiz_filter = np.array([
[-1, -1, -1],
[ 0,
...Try It: CNN Architecture
Walk through a complete CNN pipeline from input to classification
import numpy as np
# CNN Architecture: How a complete CNN processes an image
# Input → Conv → Pool → Conv → Pool → Flatten → Dense → Output
# Simulate a CNN pipeline
print("=== CNN Pipeline Simulation ===")
print()
# Input: 28×28 grayscale image (like MNIST digit)
input_shape = (28, 28, 1)
print(f"1. Input: {input_shape}")
print(f" Total values: {np.prod(input_shape)}")
# Conv Layer 1: 32 filters of 3×3
# Output: 26×26×32 (28-3+1=26)
conv1_shape = (26, 26, 32)
conv1_params = 3 * 3 * 1 * 32
...📋 Quick Reference
| Layer | What It Does | Output Effect |
|---|---|---|
| Conv2D | Detects features with filters | Feature maps |
| MaxPool | Reduces spatial dimensions | Smaller, more abstract |
| ReLU | Adds non-linearity | Zeros out negatives |
| Flatten | 2D → 1D vector | Ready for Dense |
| Dense | Classification | Class probabilities |
| Softmax | Normalise to probabilities | Sum = 1.0 |
💡 Pro Tip: Don't train CNNs from scratch for most tasks. Use transfer learning — take a pre-trained model like ResNet-50 (trained on 14M images) and fine-tune it on your data. You'll get 95%+ accuracy with just a few hundred images.
🎉 Lesson Complete!
You now understand computer vision fundamentals! Next, explore advanced neural network techniques like regularisation and batch normalisation.
Sign up for free to track which lessons you've completed and get learning reminders.