Lesson 16 โข Advanced
Advanced Normalization (BCNF, 4NF, 5NF)
Eliminate data anomalies with higher normal forms โ learn when to normalize and when denormalization is the right choice.
๐ฏ What You'll Learn
- 1NF, 2NF, 3NF review with practical examples
- Boyce-Codd Normal Form (BCNF)
- 4NF: multi-valued dependencies
- 5NF: join dependencies
- When to denormalize for performance
Why Normalize?
Normalization is the process of organizing tables to reduce redundancy and prevent data anomalies (insert, update, and delete problems).
๐ Real-World Analogy
Imagine storing a customer's address on every order. If they move, you need to update 500 orders โ and if you miss one, you have conflicting data. Normalization stores the address once in a customers table, and orders just reference the customer ID.
| Form | Rule | Eliminates |
|---|---|---|
| 1NF | Atomic values, no repeating groups | Arrays in cells |
| 2NF | No partial dependencies | Redundant data in composite keys |
| 3NF | No transitive dependencies | A depends on B depends on C |
| BCNF | Every determinant is a candidate key | Subtle 3NF violations |
| 4NF | No multi-valued dependencies | Independent multi-valued facts |
| 5NF | No join dependencies | Lossless decomposition issues |
1NF & 2NF
Eliminate repeating groups and partial dependencies
-- UNNORMALIZED: everything in one table (messy!)
-- orders: order_id, customer_name, customer_email,
-- product1_name, product1_qty, product2_name, product2_qty
-- 1NF: Eliminate repeating groups
-- Each cell has one value, no arrays or lists
CREATE TABLE order_items_1nf (
order_id INT,
customer_name VARCHAR(100),
customer_email VARCHAR(100),
product_name VARCHAR(100),
quantity INT,
price DECIMAL(10,2)
);
-- Problem: customer data repeated on every row
-- 2NF:
...3NF โ No Transitive Dependencies
Separate data that depends on non-key attributes
-- 3NF: Remove transitive dependencies
-- Bad: employee table has department_name AND department_location
-- department_location depends on department_name, not employee
-- Before 3NF:
-- employees(id, name, dept_name, dept_location, dept_manager)
-- After 3NF: separate the transitive dependency
CREATE TABLE departments (
dept_id INT PRIMARY KEY,
name VARCHAR(50),
location VARCHAR(100),
manager_id INT
);
CREATE TABLE employees (
emp_id INT PRIMARY KEY,
name VARCHAR(100
...BCNF โ Boyce-Codd Normal Form
Ensure every determinant is a candidate key
-- BCNF (Boyce-Codd Normal Form)
-- Stricter than 3NF: every determinant is a candidate key
-- Problem scenario: course scheduling
-- (student, course) -> professor
-- professor -> course (each prof teaches one course)
-- This violates BCNF because professor -> course
-- but professor is NOT a candidate key
-- Fix: split into two tables
CREATE TABLE professor_courses (
professor_id INT PRIMARY KEY,
professor_name VARCHAR(100),
course_id INT UNIQUE -- each prof teaches one course
...When to Denormalize
Normalization isn't always the answer. Sometimes you intentionally denormalize for performance:
โ Denormalize when:
Read-heavy workloads where JOINs are too expensive (dashboards, analytics, caching tables)
โ Don't denormalize when:
Write-heavy workloads where data consistency is critical (financial systems, inventory)
๐ก Pro Tip
Start normalized (3NF), measure performance, then selectively denormalize only the bottlenecks. Premature denormalization creates maintenance nightmares.
4NF & 5NF
Handle multi-valued and join dependencies
-- 4NF: Eliminate multi-valued dependencies
-- A student can have multiple hobbies AND multiple languages
-- These are independent of each other
-- Bad (violates 4NF):
-- student_info(student_id, hobby, language)
-- Creates cartesian product: 3 hobbies ร 2 languages = 6 rows!
-- Good (4NF):
CREATE TABLE student_hobbies (
student_id INT,
hobby VARCHAR(50),
PRIMARY KEY (student_id, hobby)
);
CREATE TABLE student_languages (
student_id INT,
language VARCHAR(50),
PRIMARY K
...๐ Quick Reference
| Concept | Key Rule |
|---|---|
| 1NF | One value per cell, no arrays |
| 2NF | Full dependency on entire primary key |
| 3NF | No column depends on another non-key column |
| BCNF | Every determinant is a candidate key |
| Denormalize | Only after measuring performance bottlenecks |
๐ Lesson Complete!
You now understand normalization from 1NF through 5NF and know when denormalization is appropriate. Next, dive deeper into indexing internals!
Sign up for free to track which lessons you've completed and get learning reminders.