Skip to main content
    Courses/AI & ML/Unsupervised Learning

    Lesson 15 • Intermediate

    Unsupervised Learning

    Find hidden groups in data that has no labels. By the end you'll run one K-Means step by hand, cluster with scikit-learn, choose K with the elbow and silhouette methods, reach for DBSCAN and hierarchical clustering, and know where association rules and anomaly detection fit in.

    What You'll Learn in This Lesson

    • What unsupervised learning is and why it needs no labels
    • Run one K-Means step (assign + update) in plain Python
    • Cluster data in two lines with scikit-learn's KMeans
    • Choose K with the elbow method and silhouette score
    • Pick DBSCAN or hierarchical clustering when K-Means fails
    • Where association rules and anomaly detection fit in

    🌍 Real-World Analogy: sorting without labels

    Imagine someone hands you a shoebox of 1,000 photos with no folders, no tags, nothing. Nobody tells you the categories. Yet within minutes you make piles: beaches, faces, food, pets. You never knew the labels in advance — you discovered them by noticing which photos look alike.

    That is unsupervised learning. Supervised learning is like sorting photos that already have a label on the back; unsupervised learning is sorting blank-backed photos into piles you invent yourself. The algorithm only sees the raw data and groups similar things together — which is exactly what clustering means.

    📊 The four jobs of unsupervised learning

    • Clustering — group similar points (K-Means, DBSCAN, hierarchical)
    • Dimensionality reduction — compress many features into a few (PCA, t-SNE)
    • Anomaly detection — flag the rare, odd points (Isolation Forest)
    • Association rules — find what co-occurs (Apriori: "nappies → beer")

    This lesson focuses on clustering, then tours the other three at the end.

    1K-Means, Step by Step (the whole idea)

    K-Means is the most popular clustering algorithm, and its whole engine is two steps repeated in a loop. You pick K (how many groups you want), drop K centroids (a centroid is just the centre point of a cluster), then:

    1. Assign: every point joins its nearest centroid.
    2. Update: move each centroid to the mean (average) of the points that joined it.

    Repeat assign → update until the centroids stop moving. That's it — no labels anywhere. The code below does one full iteration on six 1-D points in plain Python so you can see exactly what happens. Read every comment, then run it.

    Try It: one K-Means step (plain Python)

    Assign 6 points to the nearest of 2 centroids, then update the centroids

    Try it Yourself »
    Python
    # K-MEANS BY HAND — one step, plain Python (no libraries)
    # Goal: group points into 2 clusters when NOBODY told us the groups.
    
    # Six points on a line: three small, three large.
    points = [1.0, 1.5, 2.0, 8.0, 8.5, 9.0]
    
    # Step 0: pick 2 starting centres (centroids). We just guess.
    c1 = 2.0   # centre of cluster 0
    c2 = 7.0   # centre of cluster 1
    
    # Step 1: ASSIGN — each point joins the NEAREST centroid.
    labels = []
    for p in points:
        d1 = abs(p - c1)        # distance to centre 1
        d2 = abs(p
    ...

    2The Real Tool: scikit-learn's KMeans

    You'll never hand-code the loop in real work — scikit-learn does it in two lines and runs the whole assign/update cycle for you. The same six points, now clustered by KMeans:

    Try It: KMeans with scikit-learn

    The same clustering in two lines, with a worked Expected output

    Try it Yourself »
    Python
    # THE REAL TOOL — scikit-learn does the assign/update loop for you.
    from sklearn.cluster import KMeans
    import numpy as np
    
    # Same six 1-D points as before, reshaped into rows (KMeans wants 2-D input).
    X = np.array([1.0, 1.5, 2.0, 8.0, 8.5, 9.0]).reshape(-1, 1)
    
    # n_clusters = K (how many groups). n_init=10 reruns with different starts
    # and keeps the best — this avoids a bad random start. random_state makes it
    # reproducible so you get the same answer every run.
    model = KMeans(n_clusters=2, n_in
    ...

    fit_predict(X) trains the model and hands back the cluster id of every point in one call. inertia_ is the total spread within clusters — smaller is tighter — and you'll use it for the elbow method in Section 5.

    Your Turn 1: finish the assign step

    One blank to fill. Each point must join whichever centre is nearer — you already wrote the line for c1, so mirror it for c2.

    🎯 Your Turn: the assign step

    Fill the ___ so every point joins its nearest centroid

    Try it Yourself »
    Python
    # 🎯 YOUR TURN — finish the ASSIGN step of K-Means.
    # Each point should join whichever centre is nearer.
    
    points = [2.0, 3.0, 10.0, 11.0]
    c1 = 1.0     # centre of cluster 0
    c2 = 9.0     # centre of cluster 1
    
    labels = []
    for p in points:
        d1 = abs(p - c1)
        d2 = ___          # 👉 distance from p to c2 (copy the line above, swap c1 for c2)
        nearest = 0 if d1 <= d2 else 1
        labels.append(nearest)
    
    print("Labels:", labels)
    
    # ✅ Expected output:
    # Labels: [0, 0, 1, 1]

    Your Turn 2: finish the update step

    Now move each centre to the mean of its points. The c1 line is done — write the matching one for c2.

    🎯 Your Turn: the update step

    Fill the ___ so each centre becomes the mean of its cluster

    Try it Yourself »
    Python
    # 🎯 YOUR TURN — finish the UPDATE step.
    # Move each centre to the MEAN (average) of the points assigned to it.
    
    points = [2.0, 3.0, 10.0, 11.0]
    labels = [0, 0, 1, 1]    # from the assign step
    
    group0 = [p for p, lab in zip(points, labels) if lab == 0]
    group1 = [p for p, lab in zip(points, labels) if lab == 1]
    
    c1 = sum(group0) / len(group0)
    c2 = ___                  # 👉 the mean of group1 (mirror the c1 line above)
    
    print("New c1:", c1)
    print("New c2:", c2)
    
    # ✅ Expected output:
    # New c1: 2.5
    
    ...

    3DBSCAN: Clusters of Any Shape + Outliers

    K-Means has two weaknesses: you must pick K, and it only finds round blobs. DBSCAN (Density-Based Spatial Clustering) fixes both. It grows clusters wherever points are densely packed, finds any shape, decides the number of clusters itself, and labels lonely points as noise (id -1) — free anomaly detection.

    Two knobs control it: eps (how close counts as "neighbour") and min_samples (how many neighbours make a dense core).

    Try It: DBSCAN finds clusters and noise

    Two blobs plus one outlier — DBSCAN labels the outlier as noise

    Try it Yourself »
    Python
    # DBSCAN — clusters by DENSITY. Finds any shape AND labels outliers as noise.
    from sklearn.cluster import DBSCAN
    import numpy as np
    
    # A tight blob, a second tight blob, and one lonely outlier far away.
    X = np.array([
        [1.0, 1.0], [1.2, 0.9], [0.9, 1.1],     # blob A
        [8.0, 8.0], [8.1, 7.9], [7.9, 8.2],     # blob B
        [50.0, 50.0],                           # outlier
    ])
    
    # eps = neighbourhood radius. min_samples = points needed to form a dense core.
    model = DBSCAN(eps=1.0, min_samples=2)
    ...

    4Hierarchical Clustering: a Tree of Groups

    Hierarchical (agglomerative) clustering starts with every point as its own tiny cluster, then repeatedly merges the two closest clusters until one remains. The record of merges is a tree called a dendrogram. The trick: you don't choose K up front — you cut the tree at any height afterwards. Cut high for few big clusters, cut low for many small ones.

    Try It: hierarchical clustering

    Merge points one pair at a time, then cut the tree into 2 clusters

    Try it Yourself »
    Python
    # HIERARCHICAL CLUSTERING — start with every point alone, merge the closest.
    # Builds a tree (dendrogram); cut it at any height to choose K.
    from scipy.cluster.hierarchy import linkage, fcluster
    import numpy as np
    
    # Five points: two near 1, two near 8, one in between.
    X = np.array([[1.0], [1.5], [8.0], [8.5], [4.5]])
    
    # 'ward' merges the pair that least increases total spread (a good default).
    Z = linkage(X, method="ward")
    
    # Each row of Z = "merged cluster A and B at this distance, new size = 
    ...

    Each row of the merge table shows which two clusters joined and at what distance. The big jump in distance on the last merges is your cue for where to cut — that's the dendrogram telling you the natural number of groups.

    5Choosing K: Elbow + Silhouette

    K-Means won't pick K for you, so try a range and compare two numbers. The elbow method plots inertia against K — it always falls, but you want the "elbow" where it stops falling fast. The silhouette score (from -1 to 1) measures how tight and separated the clusters are; pick the K with the highest score. When both agree, you're confident.

    Try It: find the best K

    Compare inertia (elbow) and silhouette across K = 2 to 5

    Try it Yourself »
    Python
    # HOW MANY CLUSTERS? Compare K with the ELBOW and SILHOUETTE methods.
    from sklearn.cluster import KMeans
    from sklearn.metrics import silhouette_score
    import numpy as np
    
    # Three clearly separated blobs in 1-D -> the "true" answer is K = 3.
    X = np.array([1, 1.2, 0.8, 5, 5.1, 4.9, 9, 9.2, 8.8]).reshape(-1, 1)
    
    print("K  inertia  silhouette")
    for k in range(2, 6):
        km = KMeans(n_clusters=k, n_init=10, random_state=0).fit(X)
        sil = silhouette_score(X, km.labels_)   # 1.0 = perfect, near 0 = ov
    ...

    🧭 Beyond Clustering: Association & Anomaly Detection

    Clustering is the headline act, but two other unsupervised jobs power real products — and neither needs labels:

    🛒 Association rules

    Find items that show up together in baskets — "customers who buy nappies also buy beer." The Apriori algorithm mines these rules from transaction logs to drive recommendations and shelf layout.

    🚨 Anomaly detection

    Flag the rare, weird points — fraud, failing sensors, intrusions. Isolation Forest scores how easily each point is "isolated"; DBSCAN's -1 noise label is a simpler version of the same idea.

    You'll meet dimensionality reduction (PCA, t-SNE) — squeezing many features into a few for plotting and speed — in the very next lesson.

    6Common Errors (And How to Fix Them)

    These four mistakes trip up nearly every beginner. Spotting them early saves hours.

    ❌ Wrong K — guessing the number of clusters

    Pick K=2 when the data really has 5 groups and K-Means happily merges distinct groups into mush. There's no error message — just bad clusters.

    ✅ Fix: never hard-code K blindly. Run the elbow + silhouette loop from Section 5 first, or switch to DBSCAN/hierarchical, which decide K for you.

    ❌ Unscaled features — one column dominates

    Cluster on [income (0–100000), age (18–90)] and income's huge range drowns out age entirely — distance becomes "income distance".

    from sklearn.preprocessing import StandardScaler
    X_scaled = StandardScaler().fit_transform(X)   # mean 0, std 1 per feature
    labels = KMeans(n_clusters=3, n_init=10).fit_predict(X_scaled)

    ✅ Fix: always scale features before any distance-based clustering.

    ❌ Assuming spherical clusters — K-Means on the wrong shape

    K-Means assumes round, similarly-sized blobs. On crescents, rings, or stretched bands it slices straight through the real structure and the clusters look nonsensical.

    ✅ Fix: if your clusters aren't roughly spherical, use DBSCAN (density, any shape) or a Gaussian Mixture Model instead.

    ❌ No random init control — unstable, irreproducible results

    K-Means starts from random centroids, so a single run can land in a bad local optimum and give different clusters every time.

    # ❌ one start, different answer each run:
    KMeans(n_clusters=3)
    
    # ✅ many starts (keeps the best) + reproducible:
    KMeans(n_clusters=3, n_init=10, random_state=42)

    ✅ Fix: set n_init=10 and a fixed random_state.

    📋 Quick Reference

    Algorithm / toolUse it whenWatch out for
    KMeans(n_clusters=K)Round blobs, you can pick K, big dataMust choose K; scale features; set n_init
    DBSCAN(eps, min_samples)Any shape, unknown K, want outliersSensitive to eps / min_samples
    linkage + fclusterWant a dendrogram, cut K afterwardsSlow on large data (~O(n²–n³))
    silhouette_scoreCompare K values (higher = better)Slower than inertia on big data
    .inertia_ (elbow)Spot the K where spread flattensAlways falls — read the elbow, not the min
    StandardScalerBefore any distance-based clusteringForget it and one feature dominates

    Frequently Asked Questions

    Q: What is the difference between supervised and unsupervised learning?

    Supervised learning trains on labelled examples — every row comes with the correct answer (this email is spam, this house sold for £300k), and the model learns to reproduce those answers. Unsupervised learning gets no labels at all. It looks at the raw features and discovers structure on its own — which points sit close together, which look unusual. Use unsupervised learning when you do not know the groups in advance, or when labelling would be too expensive.

    Q: How do I choose K, the number of clusters, for K-Means?

    K-Means cannot choose K for you, so you try several and compare. The elbow method plots inertia (total within-cluster spread) against K: it always falls as K rises, but you look for the 'elbow' where it stops falling sharply — that K captures the structure without over-splitting. The silhouette score (-1 to 1, higher is better) measures how tight and well-separated the clusters are; pick the K with the highest score. When both methods agree, you can be fairly confident. If you want clusters chosen automatically, use DBSCAN instead.

    Q: Why must I scale my features before clustering?

    K-Means and most clustering algorithms measure distance, and distance is dominated by whichever feature has the biggest numeric range. If income runs 0–100000 and age runs 18–90, income alone decides every cluster and age is ignored. Standardise first — for example with StandardScaler, which rescales each feature to mean 0 and standard deviation 1 — so every feature contributes fairly. Skipping this is the single most common clustering mistake.

    Q: When should I use DBSCAN instead of K-Means?

    Use DBSCAN when your clusters are not round blobs (think crescents, rings, or long curved bands), when you do not know how many clusters there are, or when you need outliers flagged. DBSCAN groups by density and labels low-density points as noise (-1), so it finds arbitrary shapes and detects anomalies for free. K-Means is faster and simpler, but it assumes roughly spherical, similarly-sized clusters and forces every point into a group — including outliers.

    Q: What else can unsupervised learning do besides clustering?

    Two big jobs. Association rule mining finds items that co-occur — the classic 'customers who buy nappies also buy beer' from shopping baskets (the Apriori algorithm) — which powers recommendations and store layout. Anomaly detection finds the rare, unusual points: fraudulent transactions, failing machines, network intrusions. Algorithms like Isolation Forest score how 'odd' each point is, and DBSCAN's noise label is a simple form of the same idea. Both work without any labels.

    Mini-Challenge: segment your customers

    No blanks this time — just a comment outline. Build the array, fit KMeans, and print the labels. The low spenders should share one group, the high spenders the other.

    🎯 Mini-Challenge: customer segments

    Write it yourself from the outline — expected output is in the comments

    Try it Yourself »
    Python
    # 🎯 MINI-CHALLENGE: segment customers with scikit-learn KMeans
    # Data: [annual_spend, visits_per_month] for 6 customers.
    #
    # 1. from sklearn.cluster import KMeans  and  import numpy as np
    # 2. Build X with np.array([...]) using the rows below.
    # 3. Fit KMeans with n_clusters=2, n_init=10, random_state=0.
    # 4. Use fit_predict(X) to get the labels, then print them.
    #
    # Rows:
    #   [100, 1], [120, 2], [110, 1],   <- low spenders
    #   [900, 9], [950, 8], [880, 10]   <- high spenders
    #
    # ✅ Expected: tw
    ...

    🎉 Lesson Complete!

    You can now find hidden structure in unlabelled data. You ran a K-Means step by hand (assign + update), clustered in two lines with scikit-learn, chose K with the elbow and silhouette methods, reached for DBSCAN and hierarchical clustering when blobs aren't enough, and saw where association rules and anomaly detection fit.

    🚀 Up next: Dimensionality Reduction — squeeze high-dimensional data down to two or three features so you can visualise it and train faster, using PCA and t-SNE.

    Sign up for free to track which lessons you've completed and get learning reminders.

    Previous

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy PolicyTerms of Service