Lesson 48 • Advanced

Ethical AI, Bias Mitigation & Privacy ⚖️

By the end of this lesson you'll be able to measure whether a model treats groups fairly, explain its decisions, protect people's privacy, and write the audit code that catches bias before it ships.

What You'll Learn in This Lesson

✓Identify the points where bias enters an AI system
✓Measure fairness with demographic parity and equalised odds
✓Apply the 4/5ths rule and disparate-impact ratio yourself
✓Explain a single prediction the way SHAP and LIME do
✓Protect privacy with PII redaction and differential privacy
✓Name who is accountable and how to mitigate misuse risk

Before you start: You should be comfortable training and evaluating a model from Lesson 47: AutoML & NAS. Ethics is not a bolt-on — it is something you check at every stage of that pipeline.

Real-World Analogy: a fair referee

Think of an AI model as a referee in a football match. A fair referee applies the same rules to both teams: a foul is a foul whoever commits it, and a goal counts no matter which side scores. The crowd can see every decision, so the referee can be questioned and overruled.

A biased referee blows the whistle more often on one team — maybe without even realising it, because that's how they were trained. AI models do exactly this: they absorb bias from historical data and apply it at massive scale, silently. Everything in this lesson is a tool for being a fair referee: measure the calls per team (fairness metrics), show your reasoning (explainability), respect the players (privacy), and let yourself be reviewed (accountability).

1Where Does Bias Enter?

Bias is a systematic error that pushes a model's outputs in one direction for one group. It rarely comes from one villain — it leaks in at several points, and your job is to know all of them.

Training data: if past hiring favoured one group, the model learns "favour that group" as a rule.
Sampling: if a group is barely present in the data, the model never learns to serve it well.
Labels: the "correct answers" themselves may encode human prejudice (who was approved before).
Proxy variables: a feature like ZIP code or first name can secretly stand in for race or gender.
Objective: optimising only for accuracy lets the model ignore small groups entirely.

Real systems have caused real harm by missing these:

Case	What Happened	Where bias entered
Amazon hiring	Résumé screener penalised women	Biased historical labels
COMPAS	Recidivism model harsher on Black defendants	Proxy variables + labels
Healthcare algorithm	Used cost as a proxy for need	Bad objective / proxy
Facial recognition	35% error for dark-skinned women vs 1% for light-skinned men	Unbalanced sampling

2Measuring Fairness — Parity, the 4/5ths Rule & Equalised Odds

You can't fix what you don't measure. The simplest fairness metric is demographic parity: the selection rate (the share of a group that gets a positive outcome) should be roughly equal across groups.

Regulators turn that into a number with the 4/5ths rule: divide the lowest group's selection rate by the highest to get the disparate-impact ratio. If it falls below 0.8, that is treated as adverse impact you must investigate.

Parity alone can be misleading, so also check equalised odds: among people who actually deserved a positive outcome, the model's success rate should be the same for every group. The worked example below computes selection rates per group and the disparate-impact ratio from a tiny dataset — read every comment, then run it.

Worked Example: Selection Rates & Disparate Impact

Compute per-group selection rates and flag a ratio below 0.8

Try it Yourself »

Python

# ============================================
# WORKED EXAMPLE: where does bias hide?
# Pure Python — no libraries needed.
# ============================================

# A loan model already made decisions on 12 applicants.
# Each row: (group, approved?)  approved = 1, denied = 0
applicants = [
    ("A", 1), ("A", 1), ("A", 1), ("A", 0),
    ("A", 1), ("A", 1),                        # Group A: 5 of 6 approved
    ("B", 1), ("B", 0), ("B", 0), ("B", 0),
    ("B", 1), ("B", 0),               
...

Key insight: the disparate-impact ratio is just min(rate) / max(rate). Anything under 0.8 means one group is selected at less than four-fifths the rate of the best-served group — your signal to dig in.

🎯 Your Turn: Audit a Hiring Model

Fill in the blanks to compute selection rates and the disparate-impact ratio

Try it Yourself »

Python

# 🎯 YOUR TURN — measure fairness on a hiring model.
# Fill in every ___ then run it.

# Each row: (group, hired?)   hired = 1, not hired = 0
people = [
    ("women", 0), ("women", 1), ("women", 0), ("women", 0), ("women", 1),
    ("men",   1), ("men",   1), ("men",   1), ("men",   0), ("men",   1),
]

def selection_rate(rows, group):
    members = [r for r in rows if r[0] == group]
    hired   = sum(h for _, h in members)
    # 👉 return the fraction hired (hired divided by number of members)
 
...

3Transparency & Explainability (SHAP / LIME)

If a model denies someone a loan, they deserve to know why. Explainability tools answer that by attributing one prediction to the features that drove it: "+0.30 from income, −0.90 from late payments". That turns a black box into a sentence a human can challenge.

SHAP uses game theory to split a prediction fairly among its features and is consistent across the whole model. LIME builds a quick, simple approximation around a single prediction. The worked example shows the core idea — every feature's contribution is its value times its weight — and points to the real libraries.

Worked Example: Explaining One Prediction

Attribute a score to each feature, the way SHAP and LIME do

Try it Yourself »

Python

# ============================================
# WORKED EXAMPLE: explainability (the SHAP / LIME idea)
# Why did the model deny THIS person? Attribute the score
# to each feature. Pure Python — the real tools just do this
# rigorously across the whole model.
# ============================================

# A simple, fully transparent scoring model.
# Each feature has a weight; score = sum(feature * weight).
weights = {"income": 0.6, "age": -0.2, "late_payments": -0.9}

applicant = {"income": 4,
...

4Privacy — PII & Differential Privacy

PII (personally identifiable information) is anything that pins data to a real person — name, email, exact address. The first rule is simple: strip direct identifiers before the data reaches the model.

But redaction is not enough — clever attackers can re-identify people from "anonymous" patterns. Differential privacy fixes this by adding calibrated random noise so that adding or removing any single person barely changes the result. The knob is the privacy budget epsilon (ε): lower ε means more privacy and less accuracy. The worked example redacts PII and applies the differential-privacy idea to a salary average.

Worked Example: PII Redaction & Differential Privacy

Redact identifiers and add calibrated noise so no individual is recoverable

Try it Yourself »

Python

# ============================================
# WORKED EXAMPLE: privacy — PII and differential privacy
# Pure Python. The real tools (below) automate the maths.
# ============================================
import random
random.seed(0)

# Step 1: never train on raw PII (personally identifiable info).
record = {"name": "Jordan Lee", "email": "jlee@mail.com", "salary": 52000}
# Redact direct identifiers BEFORE the data reaches the model.
safe = {k: v for k, v in record.items() if k not in ("name
...

Trade-off: there is no free privacy. Stronger privacy (lower ε) always means noisier, less accurate answers. Picking ε is a deliberate policy decision, not a default.

🎯 Your Turn: Catch a Proxy Variable

Removing a protected column isn't enough — test whether a proxy leaks it

Try it Yourself »

Python

# 🎯 YOUR TURN — catch a proxy variable.
# We removed "gender", but a proxy may still leak it.
# Fill in every ___ then run it.

applicants = [
    # (club_member, gender, hired?)  "club_member" is the suspect proxy
    (1, "M", 1), (1, "M", 1), (1, "F", 1), (1, "F", 0),
    (0, "M", 0), (0, "M", 1), (0, "F", 0), (0, "F", 0),
]

# A proxy is dangerous when it strongly predicts a protected attribute.
# Check: what share of club members are male?
members = [a for a in applicants if a[0] == 1]
# 👉
...

5Accountability & Harmful-Misuse Risk

Accountability means a specific, named human owns the model's outcomes — not "the algorithm". When a decision is wrong, there must be a person to appeal to, a log of what the model did, and a clear path to override it. "The model said so" is never an acceptable answer.

Concretely, an accountable system has:

An owner: a named team responsible for harms, not just accuracy.
An audit trail: logged inputs, outputs, and model versions so you can reconstruct any decision.
A human in the loop: someone who can review and reverse high-stakes outcomes.
A model card: a short document stating the model's intended use, limits, and known risks.

You also have to think about misuse — harm caused by using the model outside its intended purpose. A face-matcher built for unlocking phones could be repurposed for mass surveillance; a text generator could mass-produce scams or disinformation. Mitigation means restricting access, rate-limiting, watermarking outputs, refusing dangerous requests, and red-teaming the system before release.

6Common Mistakes (And How to Fix Them)

❌ Training on biased data and trusting it

Feeding in years of skewed historical decisions teaches the model to copy them.

✅ Fix: audit the data first — check per-group base rates and rebalance or reweight before training.

❌ Optimising only for accuracy

A 95%-accurate model can still fail a minority group completely, because accuracy averages over everyone.

✅ Fix: track per-group metrics (selection rate, true-positive rate) alongside overall accuracy.

❌ Shipping with no audit

"It passed validation" is not the same as "it's fair and safe in the real world".

✅ Fix: run a fairness audit (the 4/5ths rule), write a model card, and re-audit on a schedule after launch.

❌ Dropping a protected column and assuming you're done

Removing "gender" does nothing if a proxy variable (name, ZIP code, club membership) still encodes it.

✅ Fix: test remaining features for correlation with the protected attribute, then drop or decorrelate the proxies too.

📋 Quick Reference — Ethical AI

Concept	What it means
Demographic parity	Equal selection rates across groups
Equalised odds	Equal true-positive (and false-positive) rates across groups
Disparate impact / 4/5ths rule	min(rate) / max(rate); below 0.8 = adverse impact
Proxy variable	A feature that secretly encodes a protected attribute
SHAP / LIME	Attribute one prediction to its features (explainability)
PII	Data that identifies a real person; redact before training
Differential privacy (ε)	Add noise so no individual is recoverable; lower ε = more privacy
Accountability	A named owner, audit trail, and human override

❓ Frequently Asked Questions

Q: What is the difference between bias and fairness in AI?

A: Bias is a systematic error that pushes a model's outputs in one direction — for example, consistently scoring one group lower. Fairness is the goal: making sure those errors do not fall harder on some groups than others. You measure fairness with metrics like demographic parity (equal selection rates) and equalised odds (equal accuracy for qualified people across groups).

Q: What is the four-fifths (80%) rule?

A: It is a US EEOC guideline for spotting adverse impact. Divide the selection rate of the lowest-selected group by the highest. If that disparate-impact ratio is below 0.8, regulators treat it as evidence of discrimination that you must justify or fix. It is a screening test, not proof on its own.

Q: What is a proxy variable and why is it dangerous?

A: A proxy is a feature that secretly stands in for a protected attribute. ZIP code can encode race; first name can encode gender. Dropping the protected column does not help if proxies remain — the model just learns the same bias through the back door. You have to test for it, not assume removal worked.

Q: What do SHAP and LIME do?

A: Both are explainability tools that answer 'why did the model decide this?'. They attribute a single prediction to the features that pushed it up or down (for example, '+0.30 from income, -0.15 from age'). SHAP is grounded in game theory and is consistent across the whole model; LIME builds a quick local approximation around one prediction. They turn a black box into something you can audit and challenge.

Q: What is differential privacy in one sentence?

A: It adds carefully calibrated random noise to results so that adding or removing any single person's data barely changes the output — meaning no individual can be re-identified from what the model releases. The privacy budget epsilon (ε) controls the trade-off: lower ε means more privacy and less accuracy.

Q: Why is optimising only for accuracy a problem?

A: A model can hit 95% accuracy while being deeply unfair, because accuracy averages over everyone and hides what happens to small groups. If 90% of your data is Group A, the model can ignore Group B entirely and still look great. Always track per-group metrics alongside overall accuracy.

🎯 Mini-Challenge: Build a Fairness Auditor

Time to fly solo. Using only what you've learned, write a small program that audits a model's decisions for two regions and flags adverse impact. The starter block has the brief and the expected output — the logic is up to you.

Mini-Challenge: Fairness Auditor

Write the selection-rate and disparate-impact logic yourself

Try it Yourself »

Python

# 🎯 MINI-CHALLENGE: build a fairness auditor
#
# You are given decisions from a model. Audit it for fairness.
#
# decisions = [
#     ("north", 1), ("north", 1), ("north", 0), ("north", 1),
#     ("south", 0), ("south", 0), ("south", 1), ("south", 0),
# ]
#
# 1. Write selection_rate(rows, group) -> approvals / total for that group
# 2. Compute the rate for "north" and "south"
# 3. Compute the disparate-impact ratio = lower rate / higher rate
# 4. Print each rate as a percentage and the ratio to
...

🎉 Lesson Complete!

You can now spot where bias enters a system, measure it with demographic parity, equalised odds, and the disparate-impact ratio, explain a single prediction the SHAP/LIME way, protect privacy with PII redaction and differential privacy, and name who's accountable when things go wrong. That is the toolkit of a fair referee.

🚀 Up next: the Final Project — put everything from this course together into a complete, responsible end-to-end ML system.

Ethical AI, Bias Mitigation & Privacy ⚖️

What You'll Learn in This Lesson

Real-World Analogy: a fair referee

1Where Does Bias Enter?

2Measuring Fairness — Parity, the 4/5ths Rule & Equalised Odds

Worked Example: Selection Rates & Disparate Impact

🎯 Your Turn: Audit a Hiring Model

3Transparency & Explainability (SHAP / LIME)

Worked Example: Explaining One Prediction

4Privacy — PII & Differential Privacy

Worked Example: PII Redaction & Differential Privacy

🎯 Your Turn: Catch a Proxy Variable

5Accountability & Harmful-Misuse Risk

6Common Mistakes (And How to Fix Them)

📋 Quick Reference — Ethical AI

❓ Frequently Asked Questions

🎯 Mini-Challenge: Build a Fairness Auditor

Mini-Challenge: Fairness Auditor

🎉 Lesson Complete!

Cookie & Privacy Settings