Lesson 33 • Advanced
Semantic Segmentation
Label every pixel in an image — learn U-Net's encoder-decoder architecture, skip connections, and segmentation metrics like mIoU and Dice.
✅ What You'll Learn
- • U-Net: encoder-decoder with skip connections
- • Semantic vs instance vs panoptic segmentation
- • Metrics: mIoU, Dice score, pixel accuracy
- • DeepLab and dilated convolutions
🎨 Colouring Every Pixel
🎯 Real-World Analogy: Object detection draws rectangles around objects. Semantic segmentation is like colouring every pixel with a different colour for each class — imagine a colouring book where "sky" is blue, "road" is grey, and "car" is red, but you have to colour every single pixel correctly. Self-driving cars need this level of detail to know exactly where the road ends and the sidewalk begins.
Segmentation is essential for autonomous driving, medical image analysis (tumour boundaries), satellite imagery, and video editing (background removal). U-Net (2015) remains the most influential architecture, especially in medical imaging where labelled data is scarce.
Try It: U-Net Architecture
See how the encoder-decoder structure preserves spatial details
import numpy as np
# U-Net: The Encoder-Decoder Architecture for Segmentation
# Contracts to capture context, expands to precise localisation
np.random.seed(42)
def simulate_encoder(image, levels=4):
"""Encoder: progressively downsample and increase channels"""
features = []
x = image
print("ENCODER (downsampling path):")
for i in range(levels):
h, w = x.shape[0] // 2, x.shape[1] // 2
channels = 64 * (2 ** i)
x = np.random.randn(h, w) # Simulated f
...Try It: Segmentation Metrics
Calculate IoU, Dice, and pixel accuracy per class
import numpy as np
# Segmentation Metrics: IoU, Dice, Pixel Accuracy
# Evaluate how well each pixel is classified
np.random.seed(42)
def pixel_accuracy(pred, target):
return np.mean(pred == target)
def iou_per_class(pred, target, n_classes):
"""Intersection over Union per class"""
ious = []
for c in range(n_classes):
pred_c = (pred == c)
target_c = (target == c)
intersection = np.sum(pred_c & target_c)
union = np.sum(pred_c | target_c)
...⚠️ Common Mistake: Using pixel accuracy as your primary metric. If 80% of your image is background, a model that predicts "background everywhere" gets 80% accuracy but is useless. Always use mIoU — it treats all classes equally regardless of their pixel count.
💡 Pro Tip: For medical imaging, start with U-Net + Dice loss (not cross-entropy). Dice loss handles class imbalance naturally. For general segmentation, use DeepLabV3+ with an EfficientNet backbone — it's the best accuracy/speed tradeoff. SAM (Segment Anything) can segment any object with zero-shot prompts.
📋 Quick Reference
| Architecture | Key Feature | Best For |
|---|---|---|
| U-Net | Skip connections | Medical, small datasets |
| DeepLabV3+ | Atrous (dilated) conv | General segmentation |
| Mask R-CNN | Instance masks | Instance segmentation |
| SegFormer | Transformer-based | High accuracy |
| SAM | Zero-shot prompts | Universal segmentation |
🎉 Lesson Complete!
You can now label every pixel in an image! Next, learn how to process audio data for speech recognition.
Sign up for free to track which lessons you've completed and get learning reminders.