Courses/AI & ML/Classification Basics

Lesson 5 • Beginner

Classification Basics

By the end you'll predict yes/no categories with logistic regression — turning a score into a probability with the sigmoid, choosing a threshold, and judging the result with precision and recall instead of plain accuracy.

What You'll Learn in This Lesson

✓Tell classification apart from regression and pick the right one
✓Use the sigmoid function to turn a score into a 0–1 probability
✓Turn a probability into a class with a decision boundary and threshold
✓Build a confusion matrix from predictions (TP, TN, FP, FN)
✓Compute accuracy, precision, recall and F1 — and know when each matters
✓Run the same job the professional way with scikit-learn's LogisticRegression

Before you start: Make sure you've completed Lesson 4: Linear Regression — classification reuses the same "weighted sum of features" idea, then adds a probability layer on top.

🌍 Real-World Analogy: Thermometer vs Traffic Light

A thermometer gives you a continuous number — 72.3°F, 72.4°F, and every value in between. That's regression: the answer is a quantity on a sliding scale.

A traffic light gives you one of a few fixed states — red, amber, or green. That's classification: the answer is a label from a small set of choices.

Logistic regression is the bridge between them. Internally it computes a number (like the thermometer), but then it asks a yes/no question — "is this number high enough?" — to pick a label (like the traffic light). The sigmoid is how it converts the number into a confidence between 0 and 1.

1Classification vs Regression

Both are supervised learning — you train on examples that already have the right answer. The only difference is the kind of answer you want back.

📈 Regression — predicts a number

House price, tomorrow's temperature, expected sales. The output is continuous: any value on a scale.

🏷️ Classification — predicts a label

Spam / not spam, cat / dog, pass / fail, disease / healthy. The output is one category from a fixed set.

This lesson covers binary classification — exactly two classes, which you label 1 (the "positive" class you care about, e.g. spam) and 0 (the negative class). Everything you learn extends naturally to more classes later.

2Logistic Regression and the Sigmoid

Plain linear regression can output any number — minus a million, plus a million. But a probability has to live between 0 and 1. The sigmoid function fixes that: it takes any number z and squashes it onto the range 0 to 1 with a smooth S-shaped curve.

sigmoid(z) = 1 / (1 + e^(-z))

z very negative  ->  output near 0   (confident it's class 0)
z = 0            ->  output is 0.5    (totally on the fence)
z very positive  ->  output near 1   (confident it's class 1)

In logistic regression, z is the same weighted sum as linear regression — w1*x1 + w2*x2 + ... + bias — and the sigmoid turns that score into a probability. Run the worked example and watch the bar grow from empty to full as z climbs.

Key insight: the model never outputs "spam" directly. It outputs a probability of spam. You decide where to draw the line that turns that probability into a label.

Worked Example: The Sigmoid Function

See how sigmoid squashes any number into a 0–1 probability

Try it Yourself »

Python

import math

# The sigmoid function squashes ANY number into the range 0 to 1.
# That 0-to-1 output is read as a PROBABILITY of the positive class.
def sigmoid(z):
    return 1 / (1 + math.exp(-z))   # math.exp(-z) is e to the power of -z

# Feed it a range of scores and watch the output curve from 0 up to 1.
for z in [-6, -2, -1, 0, 1, 2, 6]:
    p = sigmoid(z)
    bar = "#" * int(p * 20)              # a little bar chart of the probability
    print(f"sigmoid({z:+d}) = {p:.3f}  {bar}")

# Key 
...

3Decision Boundary and Threshold

A probability like 0.73 isn't a label yet. To get a label you pick a threshold — a cut-off. The default is 0.5: if the probability is at or above it, predict class 1; otherwise class 0.

The set of feature values where the probability is exactly the threshold is the decision boundary. For logistic regression with a 0.5 threshold, that's wherever the score z = 0 — because sigmoid(0) = 0.5. On one side of the line the model says 1, on the other it says 0.

The threshold is a dial you control. Lower it (say to 0.3) and the model flags more positives — it catches more real ones but also raises more false alarms. Raise it (say to 0.7) and it flags fewer — fewer false alarms but more misses. You'll use that dial deliberately in the mini-challenge.

The worked example below does the full pipeline by hand: weighted sum → sigmoid → threshold. Read each row and confirm the prediction matches the probability.

Worked Example: Logistic Regression by Hand

Weighted sum → sigmoid → threshold, row by row

Try it Yourself »

Python

import math

def sigmoid(z):
    return 1 / (1 + math.exp(-z))

# Logistic regression = (weighted sum of features + bias) -> sigmoid -> threshold.
# Example: will a student pass? Features = [study_hours, sleep_hours].
data = [
    ([2, 4], 1), ([3, 5], 1), ([4, 6], 1), ([5, 7], 1),   # actually passed (1)
    ([1, 3], 0), ([2, 3], 0), ([1, 4], 0), ([3, 3], 0),   # actually failed (0)
]
weights = [0.8, 0.5]   # how much each feature counts (pretend these were learned)
bias = -4.0            # shi
...

🎯 Your Turn 1: Build the Sigmoid and Threshold

Finish two blanks: complete the sigmoid formula, then compare each probability against the threshold to pick a class. The expected output is in the comments so you can self-check.

Your Turn: Sigmoid + Threshold

Fill in the blanks to classify raw scores at threshold 0.5

Try it Yourself »

Python

import math

# YOUR TURN 1 - fill in the blanks marked with ___

# 1) Finish the sigmoid function so it returns 1 / (1 + e^-z)
def sigmoid(z):
    return 1 / (1 + ___)        # hint: math.exp(-z)

# 2) A model gives these raw scores. Turn each into a probability,
#    then predict class 1 if the probability is at or above the threshold.
scores = [-3.0, -0.5, 0.0, 1.2, 4.0]
threshold = 0.5

for z in scores:
    prob = sigmoid(z)
    pred = 1 if prob >= ___ else 0   # hint: compare prob to the thr
...

4Accuracy, Precision, Recall and F1

To judge a classifier you compare its predictions to the truth and sort every result into four buckets — the confusion matrix:

TP (true positive): it was 1, you predicted 1 ✓
TN (true negative): it was 0, you predicted 0 ✓
FP (false positive): it was 0, you predicted 1 ✗ — a false alarm
FN (false negative): it was 1, you predicted 0 ✗ — a miss

From those four counts come the four metrics you'll use constantly:

accuracy  = (TP + TN) / everything     # how often you were right overall
precision = TP / (TP + FP)             # of your "yes" calls, how many were real
recall    = TP / (TP + FN)             # of the real positives, how many you caught
f1        = 2 * precision * recall / (precision + recall)   # balance of the two

Accuracy alone can lie. If 99% of emails are clean, a model that calls everything "clean" is 99% accurate yet catches zero spam. Precision and recall expose that failure; accuracy hides it. You'll compute precision and recall by hand in Your Turn 2.

Spam filter

Favour PRECISION — a blocked real email is worse than a missed spam.

Disease screen

Favour RECALL — missing a sick patient is far worse than a false alarm.

Both matter

Use F1 — the harmonic mean punishes ignoring either one.

🎯 Your Turn 2: Compute Precision and Recall

The confusion counts are already worked out for you in the loop. Fill in the two metric denominators using the formulas above, then run it and match the expected output.

Your Turn: Precision and Recall from Counts

Fill in the blanks to compute precision and recall

Try it Yourself »

Python

# YOUR TURN 2 - fill in the blanks marked with ___
# A spam filter was tested. Compare its predictions to the truth and
# count the four confusion-matrix outcomes, then compute the metrics.
# 1 = spam (the "positive" class), 0 = not spam.
actual    = [1, 0, 1, 1, 0, 0, 1, 0, 1, 0]
predicted = [1, 0, 0, 1, 1, 0, 1, 0, 1, 0]

TP = TN = FP = FN = 0
for a, p in zip(actual, predicted):
    if a == 1 and p == 1:
        TP += 1                 # true positive: spam, called spam
    elif a == 0 and p =
...

🛠️ The Professional Way: scikit-learn

You've now built every piece by hand, so the library version will make complete sense. In real projects you use scikit-learn (sklearn): LogisticRegression learns the weights and bias for you with .fit(), and ready-made functions compute the metrics — so you don't re-implement and re-test the maths yourself.

The example below needs scikit-learn installed, so it's shown as a read-along with the expected output in the comments. Run it locally after pip install scikit-learn to see it live.

# The professional version of everything above, using scikit-learn.
# (Run this locally with: pip install scikit-learn)
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Features: [study_hours, sleep_hours].  Labels: 1 = pass, 0 = fail.
X = [[2, 4], [3, 5], [4, 6], [5, 7], [1, 3], [2, 3], [1, 4], [3, 3]]
y = [1, 1, 1, 1, 0, 0, 0, 0]

# 1) Create the model and FIT it (this learns the weights and bias for you).
model = LogisticRegression()
model.fit(X, y)

# 2) Predict classes, and predict probabilities of the positive class.
preds = model.predict(X)
probs = model.predict_proba(X)[:, 1]   # column 1 = probability of class 1

print("predicted classes:", list(preds))
print("probability of pass:", [round(p, 2) for p in probs])

# 3) Score it with ready-made metric functions instead of counting by hand.
print("accuracy :", accuracy_score(y, preds))
print("precision:", precision_score(y, preds))
print("recall   :", recall_score(y, preds))
print("f1       :", f1_score(y, preds))

# Expected output:
# predicted classes: [1, 1, 1, 1, 0, 0, 0, 0]
# probability of pass: [0.74, 0.86, 0.93, 0.97, 0.08, 0.13, 0.1, 0.27]
# accuracy : 1.0
# precision: 1.0
# recall   : 1.0
# f1       : 1.0
#
# (Exact probabilities vary slightly by scikit-learn version.)

Notice the shape: model.fit(X, y) learns, model.predict(X) gives labels, model.predict_proba(X) gives probabilities, and the metric functions take (truth, predictions). Same four metrics you just coded by hand.

5Common Errors (And How to Fix Them)

These four mistakes trip up almost everyone learning classification. Spot them early.

❌ Trusting accuracy on imbalanced data

When one class is rare, accuracy looks great while the model is useless:

# 990 clean emails, 10 spam. Model labels EVERYTHING clean.
accuracy = 990 / 1000   # 0.99 -> looks amazing!
recall   = 0 / 10       # 0.0  -> caught zero spam

✅ Fix: report precision and recall (or F1) on the positive class, not accuracy alone.

❌ Using the wrong threshold

Leaving the threshold at 0.5 when your problem isn't balanced. A cancer screen at 0.5 may miss real cases:

prob = 0.42
pred = 1 if prob >= 0.5 else 0   # 0 -> a sick patient sent home!

✅ Fix: lower the threshold (e.g. 0.3) to raise recall when misses are costly; raise it to favour precision.

❌ Data leakage

Measuring performance on the same rows you trained on (or letting test data peek into training) gives a fake-perfect score:

model.fit(X, y)
accuracy_score(y, model.predict(X))   # 1.0 — but it MEMORISED, didn't learn

✅ Fix: split your data — train on one part, evaluate on a held-out test set the model never saw (train_test_split).

❌ Confusing precision and recall

Swapping the denominators. The fix is to anchor on what each one asks:

precision = TP / (TP + FP)   # of my "yes" calls, how many were right?
recall    = TP / (TP + FN)   # of the real positives, how many did I find?

✅ Fix: precision is about your predictions (add FP); recall is about the actual positives (add FN).

📋 Quick Reference

Term	What it is	Formula / example
Sigmoid	Score → probability (0–1)	`1 / (1 + e^-z)`
Threshold	Probability cut-off for a label	`pred = 1 if p >= 0.5`
Decision boundary	Where probability = threshold	`z = 0` at threshold 0.5
Accuracy	Overall share correct	`(TP + TN) / all`
Precision	Of predicted 1s, share correct	`TP / (TP + FP)`
Recall	Of actual 1s, share caught	`TP / (TP + FN)`
F1 score	Balance of precision & recall	`2PR / (P + R)`
sklearn	Library that does it all	`LogisticRegression().fit(X, y)`

❓ Frequently Asked Questions

Q: What is the difference between classification and regression?

A: Both are supervised learning, but the output type differs. Regression predicts a continuous number (a house price, a temperature). Classification predicts a discrete category (spam or not spam, pass or fail). If the answer is a label, it is classification.

Q: What does the sigmoid function do in logistic regression?

A: The sigmoid takes the model's raw score (a weighted sum of the features plus a bias) and squashes it into the range 0 to 1, which you read as the probability of the positive class. sigmoid(0) is exactly 0.5, large positive scores approach 1, and large negative scores approach 0.

Q: What is the decision boundary and the threshold?

A: The threshold is the probability cut-off you use to turn a probability into a class, usually 0.5. The decision boundary is the line (or surface) of feature values where the predicted probability equals that threshold. Lowering the threshold flags more positives (higher recall, lower precision); raising it flags fewer (higher precision, lower recall).

Q: Why is accuracy a bad metric for imbalanced data?

A: If 99% of emails are not spam, a model that labels everything 'not spam' scores 99% accuracy while catching zero spam. Accuracy hides this failure. Precision and recall (or F1) measure how well you handle the rare positive class, which is usually the one you care about.

Q: What is the difference between precision and recall?

A: Precision = TP / (TP + FP): of the items you predicted positive, how many really were. Recall = TP / (TP + FN): of the items that really were positive, how many you caught. A spam filter wants high precision (do not block good email); a disease screen wants high recall (do not miss a sick patient). F1 is their harmonic mean when both matter.

Q: Do I need scikit-learn to do classification?

A: No. You can implement sigmoid, thresholding, and the metrics in plain Python, which is the best way to understand them. In real projects you use scikit-learn's LogisticRegression and its accuracy_score, precision_score, recall_score, and f1_score so you do not re-implement and re-test the maths yourself.

🎯 Mini-Challenge: A Disease Screener

Now write it from a blank slate — only a comment outline is given. Build a tiny screener that deliberately uses a low threshold of 0.3 to favour recall, because in screening a miss is worse than a false alarm. The expected output is in the comments so you can check yourself.

Mini-Challenge: Screener with a Low Threshold

Implement sigmoid and classify at threshold 0.3 to maximise recall

Try it Yourself »

Python

import math

# MINI-CHALLENGE: a tiny disease screener (plain Python, no libraries)
#
# 1. Write sigmoid(z) -> 1 / (1 + e^-z)
# 2. For each [risk_score] below, compute prob = sigmoid(risk_score)
# 3. Use a LOW threshold of 0.3 (in screening you would rather over-flag
#    than miss a sick patient -> favour RECALL)
# 4. Predict 1 (refer for testing) if prob >= 0.3, else 0
# 5. Print risk_score, prob (3 dp) and the prediction for each
#
# risk_scores = [-2.0, -1.0, -0.4, 0.5, 2.0]
#
# Expected out
...

🎉

Lesson 5 complete — you can classify!

You can tell classification from regression, turn a score into a probability with the sigmoid, draw a decision boundary at a threshold you choose, and judge a model with accuracy, precision, recall and F1 — by hand and with scikit-learn's LogisticRegression.

🚀 Up next: Decision Trees — the most interpretable classifier, splitting data into simple yes/no questions.

Classification Basics

What You'll Learn in This Lesson

🌍 Real-World Analogy: Thermometer vs Traffic Light

1Classification vs Regression

📈 Regression — predicts a number

🏷️ Classification — predicts a label

2Logistic Regression and the Sigmoid

Worked Example: The Sigmoid Function

3Decision Boundary and Threshold

Worked Example: Logistic Regression by Hand

🎯 Your Turn 1: Build the Sigmoid and Threshold

Your Turn: Sigmoid + Threshold

4Accuracy, Precision, Recall and F1

🎯 Your Turn 2: Compute Precision and Recall

Your Turn: Precision and Recall from Counts

🛠️ The Professional Way: scikit-learn

5Common Errors (And How to Fix Them)

📋 Quick Reference

❓ Frequently Asked Questions

🎯 Mini-Challenge: A Disease Screener

Mini-Challenge: Screener with a Low Threshold

Lesson 5 complete — you can classify!

Cookie & Privacy Settings