Lesson 32 • Advanced

Object Detection: YOLO, SSD & Faster R-CNN

Go from "is there a cat?" to "draw a box around every cat, dog, and car." You'll measure box overlap with IoU, clean up duplicates with NMS, and know which detector to reach for.

What You'll Learn in This Lesson

✓How detection = classification + localisation (bounding boxes)
✓How to compute IoU (Intersection over Union) in plain Python
✓What anchor boxes are and why detectors use them
✓One-stage (YOLO, SSD) vs two-stage (Faster R-CNN) trade-offs
✓How Non-Max Suppression removes duplicate boxes
✓How mAP scores and compares detectors

Before you start: You should be comfortable with image CV pipelines and classification (a model that labels a whole image). Detection builds directly on that idea.

🎯 Real-World Analogy: Spotting and Boxing Objects in a Photo

Imagine handing a friend a busy holiday photo and a marker pen. Classification is asking "is there a dog in this photo?" — one yes/no answer. Object detection is asking your friend to go further: "draw a box around every dog, every person, and every car, and write what each one is." That is the whole job — find the objects and box them.

Now imagine your friend gets over-excited and scribbles five boxes around the same dog. You'd keep the neatest box and cross out the rest. That clean-up step is exactly what Non-Max Suppressiondoes, and the way you decide two boxes are "the same dog" is by measuring their overlap — that's IoU.

1Detection = Classification + Localisation

A classifier outputs one label for the whole image: "cat". A detector must output a list of objects, and for each one it predicts three things:

Class — what it is ("cat", "car", "person").
Bounding box — where it is, as four numbers.
Confidence — how sure the model is (0 to 1).

A bounding box is the rectangle drawn around an object. The most common format is the two corners: [x1, y1, x2, y2] — the top-left corner(x1, y1) and the bottom-right corner(x2, y2), all in pixels. (YOLO often uses a centre format[cx, cy, w, h] instead — same rectangle, different numbers.)

# One detection from a model, in corner format:
detection = {
    "class": "cat",
    "box":   [50, 50, 200, 200],   # x1, y1, x2, y2 in pixels
    "conf":  0.93,                 # 93% confident
}

2IoU — Measuring Box Overlap

How do you score a predicted box against the true box? You use IoU (Intersection over Union): the area where the two boxes overlap, divided by the total area they cover together.

IoU in one line:

IoU = overlap_area / (area_a + area_b - overlap_area)

IoU = 0 — the boxes don't touch at all.
IoU = 1 — the boxes are identical.
IoU ≥ 0.5 — the usual cut-off for "correct detection".

Here is the full, commented version. Read it, then run it — the output is at the bottom.

# Intersection over Union (IoU) — the core detection metric
# IoU = overlap area / combined area. Ranges 0 (no overlap) to 1 (perfect).
# Each box is [x1, y1, x2, y2] = top-left corner and bottom-right corner.

def compute_iou(box_a, box_b):
    # Coordinates of the overlapping rectangle
    x1 = max(box_a[0], box_b[0])      # leftmost right edge
    y1 = max(box_a[1], box_b[1])      # topmost bottom edge
    x2 = min(box_a[2], box_b[2])      # rightmost left edge
    y2 = min(box_a[3], box_b[3])

    # Width/height of overlap (0 if the boxes do not touch)
    overlap_w = max(0, x2 - x1)
    overlap_h = max(0, y2 - y1)
    intersection = overlap_w * overlap_h          # shared area

    # Area of each box, then the union (avoid double-counting overlap)
    area_a = (box_a[2] - box_a[0]) * (box_a[3] - box_a[1])
    area_b = (box_b[2] - box_b[0]) * (box_b[3] - box_b[1])
    union = area_a + area_b - intersection

    return intersection / union if union > 0 else 0.0

ground_truth = [50, 50, 200, 200]     # the real object's box
prediction   = [60, 60, 210, 210]     # what the model guessed

iou = compute_iou(ground_truth, prediction)
print("IoU:", round(iou, 3))
print("Match (IoU >= 0.5)?", iou >= 0.5)

# Expected output:
# IoU: 0.658
# Match (IoU >= 0.5)? True

The two max(0, ...) calls are important: if the boxes don't overlap, x2 - x1 goes negative, and clamping it to 0 keeps the intersection at 0 instead of producing a fake positive area.

Try It: Compute IoU of Two Boxes

Run the worked IoU function and check the result against the expected output

Try it Yourself »

Python

# Intersection over Union (IoU) — the core detection metric
# IoU = overlap area / combined area. Ranges 0 (no overlap) to 1 (perfect).
# Each box is [x1, y1, x2, y2] = top-left corner and bottom-right corner.

def compute_iou(box_a, box_b):
    # Coordinates of the overlapping rectangle
    x1 = max(box_a[0], box_b[0])      # leftmost right edge
    y1 = max(box_a[1], box_b[1])      # topmost bottom edge
    x2 = min(box_a[2], box_b[2])      # rightmost left edge
    y2 = min(box_a[3], box_b[3]
...

🎯 Your Turn: Finish the IoU Function

Fill in the blanks marked ___ to complete the IoU calculation

Try it Yourself »

Python

# 🎯 YOUR TURN: finish the IoU calculation
# Fill in each ___ . Each box is [x1, y1, x2, y2].

def compute_iou(box_a, box_b):
    # Overlap rectangle: inner edges of the two boxes
    x1 = max(box_a[0], box_b[0])
    y1 = max(box_a[1], box_b[1])
    x2 = min(box_a[2], box_b[2])
    y2 = min(box_a[3], box_b[3])

    # 👉 overlap width and height (clamp negatives to 0 with max(0, ...))
    overlap_w = max(0, ___)          # 👉 use x2 - x1
    overlap_h = max(0, ___)          # 👉 use y2 - y1
    i
...

3Anchor Boxes — Pre-Set Shapes to Refine

Predicting box coordinates from scratch is hard. Instead, many detectors start from anchor boxes: a fixed set of reference rectangles in a range of shapes and sizes (a tall one for people, a wide one for cars, a square one, and so on). The model doesn't invent a box — it picks the closest anchor and nudges its size and position to fit the object.

Think of anchors as templates printed on tracing paper. The model slides the best-matching template over the object and tweaks it slightly, which is far easier than drawing freehand.

Anchors are matched to ground-truth boxes by IoU during training, so IoU shows up here too. Choosing anchor shapes that match your data (often via k-means on your boxes) makes training much easier — see Common Errors below.

4One-Stage vs Two-Stage Detectors

There are two broad families of detector, and they trade speed against accuracy:

One-stage — YOLO, SSD

Predict every box and class in a single pass over the image. "You Only Look Once." Very fast (real-time video, mobile, edge devices), slightly weaker on tiny or crowded objects.

Two-stage — Faster R-CNN

First propose regions that might hold an object, then classify and refine each one. Slower, but typically more accurate, especially on small and overlapping objects.

A good default: reach for YOLO when you need speed or real-time, and Faster R-CNNwhen accuracy on hard images matters more than frame rate.

🌍 Worked Example: Real Detection with YOLO (Ultralytics)

In practice you rarely write IoU and NMS by hand — a library does it. This is read-only (it needspip install ultralytics and an image), but it shows how little code real detection takes. Ultralytics runs NMS for you, so the boxes come back already de-duplicated.

# Real detection with Ultralytics YOLO (run locally: pip install ultralytics)
# YOLO = "You Only Look Once" — one forward pass over the whole image.
from ultralytics import YOLO

model = YOLO("yolov8n.pt")            # tiny pretrained model (COCO, 80 classes)
results = model("street.jpg", conf=0.25)   # conf = confidence threshold

# Ultralytics applies NMS for you, so you get clean, de-duplicated boxes.
for box in results[0].boxes:
    cls_id = int(box.cls[0])
    label  = model.names[cls_id]      # e.g. "person", "car"
    conf   = float(box.conf[0])
    x1, y1, x2, y2 = box.xyxy[0].tolist()   # corner coordinates in pixels
    print(f"{label:8} {conf:.2f}  box=({x1:.0f},{y1:.0f},{x2:.0f},{y2:.0f})")

# Expected output (depends on the image):
# person   0.91  box=(34,58,121,402)
# car      0.88  box=(220,180,540,360)
# dog      0.76  box=(410,260,505,395)

5Non-Max Suppression (NMS) — Removing Duplicates

A raw detector fires dozens of overlapping boxes around each object. Non-Maximum Suppression (NMS)tidies them up with a simple rule: sort boxes by confidence, keep the most confident one, then throw away every other box that overlaps it too much (IoU above a threshold, usually 0.5). Repeat until none are left.

Here is NMS written out in plain Python, reusing the IoU function. Run it and watch four boxes become two.

# Non-Maximum Suppression (NMS): keep the best box, drop overlapping duplicates.
# A detector often fires several boxes for ONE object. NMS cleans that up.

def compute_iou(box_a, box_b):
    x1 = max(box_a[0], box_b[0]); y1 = max(box_a[1], box_b[1])
    x2 = min(box_a[2], box_b[2]); y2 = min(box_a[3], box_b[3])
    intersection = max(0, x2 - x1) * max(0, y2 - y1)
    area_a = (box_a[2] - box_a[0]) * (box_a[3] - box_a[1])
    area_b = (box_b[2] - box_b[0]) * (box_b[3] - box_b[1])
    union = area_a + area_b - intersection
    return intersection / union if union > 0 else 0.0

# Each detection: (box, confidence, class)
detections = [
    ([48, 48, 202, 202], 0.95, "cat"),
    ([50, 52, 198, 200], 0.87, "cat"),   # overlaps the cat above
    ([52, 46, 205, 198], 0.72, "cat"),   # also the same cat
    ([300, 100, 400, 250], 0.91, "dog"),
]

def nms(dets, iou_threshold=0.5):
    # 1) Sort by confidence, highest first
    dets = sorted(dets, key=lambda d: d[1], reverse=True)
    kept = []
    for box, conf, cls in dets:
        # 2) Keep this box only if it does not overlap a kept box of the same class
        duplicate = any(
            cls == k_cls and compute_iou(box, k_box) > iou_threshold
            for k_box, k_conf, k_cls in kept
        )
        if not duplicate:
            kept.append((box, conf, cls))
    return kept

print("Before NMS:", len(detections), "boxes")
result = nms(detections)
print("After NMS: ", len(result), "boxes")
for box, conf, cls in result:
    print(f"  {cls}: conf={conf:.2f} box={box}")

# Expected output:
# Before NMS: 4 boxes
# After NMS:  2 boxes
#   cat: conf=0.95 box=[48, 48, 202, 202]
#   dog: conf=0.91 box=[300, 100, 400, 250]

Try It: Simple NMS Over a Few Boxes

Run NMS and watch overlapping duplicates get removed

Try it Yourself »

Python

# Non-Maximum Suppression (NMS): keep the best box, drop overlapping duplicates.
# A detector often fires several boxes for ONE object. NMS cleans that up.

def compute_iou(box_a, box_b):
    x1 = max(box_a[0], box_b[0]); y1 = max(box_a[1], box_b[1])
    x2 = min(box_a[2], box_b[2]); y2 = min(box_a[3], box_b[3])
    intersection = max(0, x2 - x1) * max(0, y2 - y1)
    area_a = (box_a[2] - box_a[0]) * (box_a[3] - box_a[1])
    area_b = (box_b[2] - box_b[0]) * (box_b[3] - box_b[1])
    union = are
...

🎯 Your Turn: Finish the NMS Keep/Drop Test

Fill in the blanks to decide whether a box is a duplicate

Try it Yourself »

Python

# 🎯 YOUR TURN: finish the NMS "keep or drop" decision
# Keep a box only if it does NOT overlap an already-kept box too much.

def compute_iou(box_a, box_b):
    x1 = max(box_a[0], box_b[0]); y1 = max(box_a[1], box_b[1])
    x2 = min(box_a[2], box_b[2]); y2 = min(box_a[3], box_b[3])
    inter = max(0, x2 - x1) * max(0, y2 - y1)
    area_a = (box_a[2] - box_a[0]) * (box_a[3] - box_a[1])
    area_b = (box_b[2] - box_b[0]) * (box_b[3] - box_b[1])
    union = area_a + area_b - inter
    return inter
...

6mAP — Scoring and Comparing Detectors

One number summarises a detector's quality: mAP (mean Average Precision). For each class you measure precision across recall levels to get its Average Precision, then average across all classes. A box counts as correct only if its IoU with the truth clears a threshold.

mAP50 — uses a single IoU threshold of 0.50 (more forgiving).
mAP50-95 — averages over IoU 0.50 to 0.95; the headline COCO score. Higher means tighter boxes.

This is read-only (it needs the library and a dataset), but it shows how you'd measure mAP in practice:

# Evaluating a detector: mAP (mean Average Precision) with Ultralytics.
from ultralytics import YOLO

model = YOLO("yolov8n.pt")
metrics = model.val(data="coco128.yaml")   # run validation on a labelled set

print(f"mAP50:    {metrics.box.map50:.3f}")   # IoU threshold 0.50 only
print(f"mAP50-95: {metrics.box.map:.3f}")      # averaged over IoU 0.50..0.95

# Expected output (approximate):
# mAP50:    0.61
# mAP50-95: 0.45

# Rule of thumb when picking a detector:
#   1-stage (YOLO, SSD)   -> fast, great for real-time / edge / video
#   2-stage (Faster R-CNN)-> slower, stronger on small + crowded objects
# mAP50-95 is the headline number on the COCO leaderboard: higher = tighter boxes.

Common Errors (And How to Fix Them)

❌ Wrong IoU threshold

Setting the match threshold too low counts sloppy boxes as correct; too high rejects boxes that are actually fine.

✅ Fix:

# Use 0.5 as the standard "is this a correct detection" cut-off.
match = iou >= 0.5        # COCO AP50; report mAP50-95 for the full picture
# Don't reuse the SAME number for the NMS overlap test without thinking —
# they're different jobs (matching vs. de-duplicating).

❌ Forgetting NMS entirely

Raw model output has many overlapping boxes. Without NMS you draw five boxes on one object.

✅ Fix:

# Always run NMS on raw outputs. Libraries do it for you:
results = model("img.jpg", iou=0.5, conf=0.25)   # iou = NMS overlap threshold
# Writing your own loop? Sort by confidence, then suppress high-IoU overlaps.

❌ Class imbalance

If 95% of your training boxes are "car" and 5% are "bicycle", the model learns to ignore bicycles and your per-class mAP for the rare class collapses — even though overall accuracy looks fine.

✅ Fix:

# Gather more examples of rare classes, oversample them, or weight the loss.
# Always read PER-CLASS mAP, not just the overall average, to catch this.

❌ Small objects get missed

Tiny objects (distant faces, far-off signs) shrink to a few pixels after downsampling and vanish, or no anchor shape fits them.

✅ Fix:

# Train and infer at a higher resolution, e.g.:
results = model("img.jpg", imgsz=1280)   # bigger input keeps small objects alive
# Add anchor sizes (or feature levels) suited to small boxes; tile large images.

📋 Quick Reference

Concept	What It Does	Key Number
Bounding box	Locates an object	`[x1, y1, x2, y2]`
IoU	Measures box overlap	≥ 0.5 = match
Anchor box	Pre-set shape to refine	matched by IoU
NMS	Removes duplicate boxes	IoU > 0.5 suppressed
One-stage	YOLO / SSD — fast	real-time FPS
Two-stage	Faster R-CNN — accurate	higher mAP
mAP50-95	Overall detector quality	higher = better

❓ Frequently Asked Questions

Q: What is the difference between image classification and object detection?

A: Classification answers 'what is in this image?' with a single label for the whole picture. Object detection answers 'what objects are here AND where is each one?' by drawing a bounding box around every object and labelling it. Detection is classification plus localisation.

Q: What is IoU (Intersection over Union)?

A: IoU measures how much a predicted box overlaps the true box. It is the area where the two boxes overlap divided by the total area they cover together. It ranges from 0 (no overlap) to 1 (perfect overlap). A prediction usually counts as correct when IoU is at least 0.5.

Q: Why do I need Non-Maximum Suppression (NMS)?

A: A detector outputs many overlapping boxes for the same object. NMS keeps the box with the highest confidence and removes every other box that overlaps it too much (high IoU). Without NMS you get five boxes drawn around one cat instead of one.

Q: What is the difference between one-stage and two-stage detectors?

A: One-stage detectors (YOLO, SSD) predict boxes and classes in a single pass over the image — fast, great for real-time video. Two-stage detectors (Faster R-CNN) first propose regions that might contain objects, then classify each one — slower but usually more accurate on small or crowded objects.

Q: What does mAP mean when comparing detectors?

A: mAP (mean Average Precision) is the standard score for detectors. For each class you measure precision across recall levels to get its Average Precision, then average across all classes. COCO reports mAP averaged over IoU thresholds from 0.5 to 0.95, so a higher mAP means better and tighter boxes.

🎯 Mini-Challenge: Count the Hits

Time to fly solo. Using IoU, count how many predicted boxes actually hit the ground-truth object. The starter block gives you the brief and the data — write the logic yourself.

Mini-Challenge

Count predictions whose IoU with the ground truth is at least 0.5

Try it Yourself »

Python

# 🎯 MINI-CHALLENGE: count how many predictions "hit" the ground truth
#
# You are given one ground-truth box and a list of predicted boxes.
# 1. Write (or reuse) a compute_iou(box_a, box_b) function.
# 2. A prediction is a HIT when its IoU with the ground truth is >= 0.5.
# 3. Print how many predictions hit, e.g. "Hits: 2 / 4".
#
# Starter data:
# ground_truth = [50, 50, 150, 150]
# predictions  = [[52, 48, 150, 152], [60, 60, 160, 160],
#                 [0, 0, 40, 40], [200, 200, 260, 260]]
#
...

🎉

Lesson complete — you can detect and localise objects!

You can describe a detection as class + box + confidence, compute IoU by hand, run NMS to remove duplicates, choose between one-stage and two-stage detectors, and read an mAP score. These are the building blocks behind every modern detector.

🚀 Up next: Semantic Segmentation — go beyond boxes and label every single pixel in the image.

Object Detection: YOLO, SSD & Faster R-CNN

What You'll Learn in This Lesson

🎯 Real-World Analogy: Spotting and Boxing Objects in a Photo

1Detection = Classification + Localisation

2IoU — Measuring Box Overlap

Try It: Compute IoU of Two Boxes

🎯 Your Turn: Finish the IoU Function

3Anchor Boxes — Pre-Set Shapes to Refine

4One-Stage vs Two-Stage Detectors

🌍 Worked Example: Real Detection with YOLO (Ultralytics)

5Non-Max Suppression (NMS) — Removing Duplicates

Try It: Simple NMS Over a Few Boxes

🎯 Your Turn: Finish the NMS Keep/Drop Test

6mAP — Scoring and Comparing Detectors

Common Errors (And How to Fix Them)

📋 Quick Reference

❓ Frequently Asked Questions

🎯 Mini-Challenge: Count the Hits

Mini-Challenge

Lesson complete — you can detect and localise objects!

Cookie & Privacy Settings