Lesson 48 • Advanced
Ethical AI, Bias Mitigation & Privacy ⚖️
By the end of this lesson you'll be able to measure whether a model treats groups fairly, explain its decisions, protect people's privacy, and write the audit code that catches bias before it ships.
What You'll Learn in This Lesson
- ✓Identify the points where bias enters an AI system
- ✓Measure fairness with demographic parity and equalised odds
- ✓Apply the 4/5ths rule and disparate-impact ratio yourself
- ✓Explain a single prediction the way SHAP and LIME do
- ✓Protect privacy with PII redaction and differential privacy
- ✓Name who is accountable and how to mitigate misuse risk
Real-World Analogy: a fair referee
Think of an AI model as a referee in a football match. A fair referee applies the same rules to both teams: a foul is a foul whoever commits it, and a goal counts no matter which side scores. The crowd can see every decision, so the referee can be questioned and overruled.
A biased referee blows the whistle more often on one team — maybe without even realising it, because that's how they were trained. AI models do exactly this: they absorb bias from historical data and apply it at massive scale, silently. Everything in this lesson is a tool for being a fair referee: measure the calls per team (fairness metrics), show your reasoning (explainability), respect the players (privacy), and let yourself be reviewed (accountability).
1Where Does Bias Enter?
Bias is a systematic error that pushes a model's outputs in one direction for one group. It rarely comes from one villain — it leaks in at several points, and your job is to know all of them.
- Training data: if past hiring favoured one group, the model learns "favour that group" as a rule.
- Sampling: if a group is barely present in the data, the model never learns to serve it well.
- Labels: the "correct answers" themselves may encode human prejudice (who was approved before).
- Proxy variables: a feature like ZIP code or first name can secretly stand in for race or gender.
- Objective: optimising only for accuracy lets the model ignore small groups entirely.
Real systems have caused real harm by missing these:
| Case | What Happened | Where bias entered |
|---|---|---|
| Amazon hiring | Résumé screener penalised women | Biased historical labels |
| COMPAS | Recidivism model harsher on Black defendants | Proxy variables + labels |
| Healthcare algorithm | Used cost as a proxy for need | Bad objective / proxy |
| Facial recognition | 35% error for dark-skinned women vs 1% for light-skinned men | Unbalanced sampling |
2Measuring Fairness — Parity, the 4/5ths Rule & Equalised Odds
You can't fix what you don't measure. The simplest fairness metric is demographic parity: the selection rate (the share of a group that gets a positive outcome) should be roughly equal across groups.
Regulators turn that into a number with the 4/5ths rule: divide the lowest group's selection rate by the highest to get the disparate-impact ratio. If it falls below 0.8, that is treated as adverse impact you must investigate.
Parity alone can be misleading, so also check equalised odds: among people who actually deserved a positive outcome, the model's success rate should be the same for every group. The worked example below computes selection rates per group and the disparate-impact ratio from a tiny dataset — read every comment, then run it.
Worked Example: Selection Rates & Disparate Impact
Compute per-group selection rates and flag a ratio below 0.8
# ============================================
# WORKED EXAMPLE: where does bias hide?
# Pure Python — no libraries needed.
# ============================================
# A loan model already made decisions on 12 applicants.
# Each row: (group, approved?) approved = 1, denied = 0
applicants = [
("A", 1), ("A", 1), ("A", 1), ("A", 0),
("A", 1), ("A", 1), # Group A: 5 of 6 approved
("B", 1), ("B", 0), ("B", 0), ("B", 0),
("B", 1), ("B", 0),
...min(rate) / max(rate). Anything under 0.8 means one group is selected at less than four-fifths the rate of the best-served group — your signal to dig in.🎯 Your Turn: Audit a Hiring Model
Fill in the blanks to compute selection rates and the disparate-impact ratio
# 🎯 YOUR TURN — measure fairness on a hiring model.
# Fill in every ___ then run it.
# Each row: (group, hired?) hired = 1, not hired = 0
people = [
("women", 0), ("women", 1), ("women", 0), ("women", 0), ("women", 1),
("men", 1), ("men", 1), ("men", 1), ("men", 0), ("men", 1),
]
def selection_rate(rows, group):
members = [r for r in rows if r[0] == group]
hired = sum(h for _, h in members)
# 👉 return the fraction hired (hired divided by number of members)
...3Transparency & Explainability (SHAP / LIME)
If a model denies someone a loan, they deserve to know why. Explainability tools answer that by attributing one prediction to the features that drove it: "+0.30 from income, −0.90 from late payments". That turns a black box into a sentence a human can challenge.
SHAP uses game theory to split a prediction fairly among its features and is consistent across the whole model. LIME builds a quick, simple approximation around a single prediction. The worked example shows the core idea — every feature's contribution is its value times its weight — and points to the real libraries.
Worked Example: Explaining One Prediction
Attribute a score to each feature, the way SHAP and LIME do
# ============================================
# WORKED EXAMPLE: explainability (the SHAP / LIME idea)
# Why did the model deny THIS person? Attribute the score
# to each feature. Pure Python — the real tools just do this
# rigorously across the whole model.
# ============================================
# A simple, fully transparent scoring model.
# Each feature has a weight; score = sum(feature * weight).
weights = {"income": 0.6, "age": -0.2, "late_payments": -0.9}
applicant = {"income": 4,
...4Privacy — PII & Differential Privacy
PII (personally identifiable information) is anything that pins data to a real person — name, email, exact address. The first rule is simple: strip direct identifiers before the data reaches the model.
But redaction is not enough — clever attackers can re-identify people from "anonymous" patterns. Differential privacy fixes this by adding calibrated random noise so that adding or removing any single person barely changes the result. The knob is the privacy budget epsilon (ε): lower ε means more privacy and less accuracy. The worked example redacts PII and applies the differential-privacy idea to a salary average.
Worked Example: PII Redaction & Differential Privacy
Redact identifiers and add calibrated noise so no individual is recoverable
# ============================================
# WORKED EXAMPLE: privacy — PII and differential privacy
# Pure Python. The real tools (below) automate the maths.
# ============================================
import random
random.seed(0)
# Step 1: never train on raw PII (personally identifiable info).
record = {"name": "Jordan Lee", "email": "jlee@mail.com", "salary": 52000}
# Redact direct identifiers BEFORE the data reaches the model.
safe = {k: v for k, v in record.items() if k not in ("name
...🎯 Your Turn: Catch a Proxy Variable
Removing a protected column isn't enough — test whether a proxy leaks it
# 🎯 YOUR TURN — catch a proxy variable.
# We removed "gender", but a proxy may still leak it.
# Fill in every ___ then run it.
applicants = [
# (club_member, gender, hired?) "club_member" is the suspect proxy
(1, "M", 1), (1, "M", 1), (1, "F", 1), (1, "F", 0),
(0, "M", 0), (0, "M", 1), (0, "F", 0), (0, "F", 0),
]
# A proxy is dangerous when it strongly predicts a protected attribute.
# Check: what share of club members are male?
members = [a for a in applicants if a[0] == 1]
# 👉
...5Accountability & Harmful-Misuse Risk
Accountability means a specific, named human owns the model's outcomes — not "the algorithm". When a decision is wrong, there must be a person to appeal to, a log of what the model did, and a clear path to override it. "The model said so" is never an acceptable answer.
Concretely, an accountable system has:
- An owner: a named team responsible for harms, not just accuracy.
- An audit trail: logged inputs, outputs, and model versions so you can reconstruct any decision.
- A human in the loop: someone who can review and reverse high-stakes outcomes.
- A model card: a short document stating the model's intended use, limits, and known risks.
You also have to think about misuse — harm caused by using the model outside its intended purpose. A face-matcher built for unlocking phones could be repurposed for mass surveillance; a text generator could mass-produce scams or disinformation. Mitigation means restricting access, rate-limiting, watermarking outputs, refusing dangerous requests, and red-teaming the system before release.
6Common Mistakes (And How to Fix Them)
❌ Training on biased data and trusting it
Feeding in years of skewed historical decisions teaches the model to copy them.
✅ Fix: audit the data first — check per-group base rates and rebalance or reweight before training.
❌ Optimising only for accuracy
A 95%-accurate model can still fail a minority group completely, because accuracy averages over everyone.
✅ Fix: track per-group metrics (selection rate, true-positive rate) alongside overall accuracy.
❌ Shipping with no audit
"It passed validation" is not the same as "it's fair and safe in the real world".
✅ Fix: run a fairness audit (the 4/5ths rule), write a model card, and re-audit on a schedule after launch.
❌ Dropping a protected column and assuming you're done
Removing "gender" does nothing if a proxy variable (name, ZIP code, club membership) still encodes it.
✅ Fix: test remaining features for correlation with the protected attribute, then drop or decorrelate the proxies too.
📋 Quick Reference — Ethical AI
| Concept | What it means |
|---|---|
| Demographic parity | Equal selection rates across groups |
| Equalised odds | Equal true-positive (and false-positive) rates across groups |
| Disparate impact / 4/5ths rule | min(rate) / max(rate); below 0.8 = adverse impact |
| Proxy variable | A feature that secretly encodes a protected attribute |
| SHAP / LIME | Attribute one prediction to its features (explainability) |
| PII | Data that identifies a real person; redact before training |
| Differential privacy (ε) | Add noise so no individual is recoverable; lower ε = more privacy |
| Accountability | A named owner, audit trail, and human override |
❓ Frequently Asked Questions
Q: What is the difference between bias and fairness in AI?
A: Bias is a systematic error that pushes a model's outputs in one direction — for example, consistently scoring one group lower. Fairness is the goal: making sure those errors do not fall harder on some groups than others. You measure fairness with metrics like demographic parity (equal selection rates) and equalised odds (equal accuracy for qualified people across groups).
Q: What is the four-fifths (80%) rule?
A: It is a US EEOC guideline for spotting adverse impact. Divide the selection rate of the lowest-selected group by the highest. If that disparate-impact ratio is below 0.8, regulators treat it as evidence of discrimination that you must justify or fix. It is a screening test, not proof on its own.
Q: What is a proxy variable and why is it dangerous?
A: A proxy is a feature that secretly stands in for a protected attribute. ZIP code can encode race; first name can encode gender. Dropping the protected column does not help if proxies remain — the model just learns the same bias through the back door. You have to test for it, not assume removal worked.
Q: What do SHAP and LIME do?
A: Both are explainability tools that answer 'why did the model decide this?'. They attribute a single prediction to the features that pushed it up or down (for example, '+0.30 from income, -0.15 from age'). SHAP is grounded in game theory and is consistent across the whole model; LIME builds a quick local approximation around one prediction. They turn a black box into something you can audit and challenge.
Q: What is differential privacy in one sentence?
A: It adds carefully calibrated random noise to results so that adding or removing any single person's data barely changes the output — meaning no individual can be re-identified from what the model releases. The privacy budget epsilon (ε) controls the trade-off: lower ε means more privacy and less accuracy.
Q: Why is optimising only for accuracy a problem?
A: A model can hit 95% accuracy while being deeply unfair, because accuracy averages over everyone and hides what happens to small groups. If 90% of your data is Group A, the model can ignore Group B entirely and still look great. Always track per-group metrics alongside overall accuracy.
🎯 Mini-Challenge: Build a Fairness Auditor
Time to fly solo. Using only what you've learned, write a small program that audits a model's decisions for two regions and flags adverse impact. The starter block has the brief and the expected output — the logic is up to you.
Mini-Challenge: Fairness Auditor
Write the selection-rate and disparate-impact logic yourself
# 🎯 MINI-CHALLENGE: build a fairness auditor
#
# You are given decisions from a model. Audit it for fairness.
#
# decisions = [
# ("north", 1), ("north", 1), ("north", 0), ("north", 1),
# ("south", 0), ("south", 0), ("south", 1), ("south", 0),
# ]
#
# 1. Write selection_rate(rows, group) -> approvals / total for that group
# 2. Compute the rate for "north" and "south"
# 3. Compute the disparate-impact ratio = lower rate / higher rate
# 4. Print each rate as a percentage and the ratio to
...🎉 Lesson Complete!
You can now spot where bias enters a system, measure it with demographic parity, equalised odds, and the disparate-impact ratio, explain a single prediction the SHAP/LIME way, protect privacy with PII redaction and differential privacy, and name who's accountable when things go wrong. That is the toolkit of a fair referee.
🚀 Up next: the Final Project — put everything from this course together into a complete, responsible end-to-end ML system.
Sign up for free to track which lessons you've completed and get learning reminders.