Lesson 32 • Advanced
Object Detection: YOLO, SSD & Faster R-CNN
Detect and localise objects in images — learn IoU, Non-Maximum Suppression, and how YOLO processes entire images in one pass.
✅ What You'll Learn
- • IoU: measuring bounding box overlap
- • Non-Maximum Suppression to remove duplicate detections
- • YOLO: grid-based single-shot detection
- • 1-stage vs 2-stage detectors and when to use each
🎯 Finding Objects in Images
🎯 Real-World Analogy: Classification is like asking "Is there a cat in this photo?" Object detection is like asking "Where are ALL the cats, dogs, and cars in this photo, and draw a box around each one." It's the difference between a yes/no question and a treasure hunt with a marker pen.
Object detection adds localisation to classification. The model must predict: (1) what objects are present, (2) where they are (bounding boxes), and (3) how confident it is. Modern detectors like YOLOv8 can process 100+ frames per second — fast enough for real-time video.
Try It: IoU & NMS
Calculate bounding box overlap and remove duplicate detections
import numpy as np
# Intersection over Union (IoU): The Core Detection Metric
# Measures how well a predicted box overlaps with the ground truth
def compute_iou(box1, box2):
"""
Compute IoU between two bounding boxes
Each box: [x1, y1, x2, y2] (top-left and bottom-right corners)
"""
# Intersection coordinates
x1 = max(box1[0], box2[0])
y1 = max(box1[1], box2[1])
x2 = min(box1[2], box2[2])
y2 = min(box1[3], box2[3])
# Intersection area
intersecti
...Try It: YOLO Architecture
See how YOLO divides images into grid cells for detection
import numpy as np
# YOLO: You Only Look Once — Real-Time Object Detection
# Process the ENTIRE image in one forward pass
np.random.seed(42)
def simulate_yolo_grid(image_size, grid_size, n_boxes, n_classes):
"""Simulate YOLO's grid-based detection"""
cell_size = image_size // grid_size
print(f"=== YOLO Architecture (v8) ===")
print()
print(f"Image: {image_size}x{image_size}")
print(f"Grid: {grid_size}x{grid_size} = {grid_size**2} cells")
print(f"Each cell pre
...⚠️ Common Mistake: Training object detectors without enough anchor box variety. If your objects are very tall/thin or very wide, the default anchors won't match well. Use k-means clustering on your dataset's bounding boxes to generate custom anchors.
💡 Pro Tip: For quick prototyping, use Ultralytics YOLOv8 — it's 5 lines of Python to train on custom data. For production, consider ONNX export for 2-3× faster inference. If accuracy matters more than speed, use Faster R-CNN with a ResNet-101 backbone.
📋 Quick Reference
| Concept | What It Does | Threshold |
|---|---|---|
| IoU | Measures box overlap | ≥0.5 for match |
| NMS | Removes duplicate boxes | IoU > 0.5 suppressed |
| Confidence | Object presence score | ≥0.25 typically |
| mAP | Mean Average Precision | Higher = better |
| FPS | Frames per second | ≥30 for real-time |
🎉 Lesson Complete!
You can now detect and localise objects in images! Next, learn semantic segmentation — labelling every single pixel.
Sign up for free to track which lessons you've completed and get learning reminders.