Lesson 5 • Beginner
Classification Basics
Predict categories — spam or not spam, cat or dog, pass or fail — using logistic regression and k-NN.
✅ What You'll Learn
- • Logistic regression and the sigmoid function
- • k-Nearest Neighbours (k-NN) classifier
- • Confusion matrix, precision, recall, F1 score
- • When to prioritise precision vs recall
🏷️ Classification vs Regression
🎯 Real-World Analogy: Regression is like a thermometer — it gives you a continuous number (72.3°F). Classification is like a traffic light — it gives you a category (red, yellow, or green). Both are prediction, but the output type is different.
📈 Regression
Predicts numbers: house price, temperature, stock price
🏷️ Classification
Predicts categories: spam/not spam, cat/dog, disease/healthy
Try It: Logistic Regression
Use sigmoid to predict pass/fail from study and sleep hours
import numpy as np
# Logistic Regression: Classification, not regression!
# Predicts PROBABILITY of belonging to a class (0 to 1)
# The sigmoid function: squashes any number to 0-1
def sigmoid(z):
return 1 / (1 + np.exp(-z))
# Show how sigmoid works
print("=== Sigmoid Function ===")
inputs = [-5, -2, -1, 0, 1, 2, 5]
for x in inputs:
prob = sigmoid(x)
bar = "█" * int(prob * 20)
print(f" sigmoid({x:+.0f}) = {prob:.3f} {bar}")
print()
# Simple classification example
# Features:
...Try It: k-Nearest Neighbours
Classify mystery fruits by finding their closest neighbours
import numpy as np
# k-Nearest Neighbours (k-NN)
# Simplest classifier: "Tell me your neighbours, I'll tell you who you are"
# Dataset: fruit classification by weight and colour score
# Features: [weight_g, colour_score(1-10)]
fruits_X = np.array([
[150, 7], [170, 8], [140, 6], [160, 7], # Apples
[200, 3], [220, 2], [180, 4], [210, 3], # Bananas
[50, 9], [60, 8], [45, 10], [55, 9], # Cherries
])
fruits_y = ['Apple', 'Apple', 'Apple', 'Apple',
'Banana', 'Banana', '
...Try It: Classification Metrics
Build a confusion matrix and calculate precision, recall, and F1
import numpy as np
# Classification Metrics: Accuracy isn't always enough!
# Confusion matrix example: Email spam detector
# actual: 0=not_spam, 1=spam
actual = [1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0]
predicted = [1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0]
# Build confusion matrix
TP = sum(1 for a, p in zip(actual, predicted) if a == 1 and p == 1)
TN = sum(1 for a, p in zip(actual, predicted) if a == 0 and p == 0)
FP = sum(1 for a, p in zip(actual, pre
...📋 Quick Reference
| Algorithm | Type | Best For |
|---|---|---|
| Logistic Regression | Linear | Binary classification, interpretable |
| k-NN | Instance-based | Small datasets, simple patterns |
| Precision | Metric | When false positives are costly |
| Recall | Metric | When false negatives are costly |
| F1 Score | Metric | Balanced precision + recall |
🎉 Lesson Complete!
You can now classify data! Next, learn Decision Trees — the most interpretable ML algorithm.
Sign up for free to track which lessons you've completed and get learning reminders.