Courses/AI & ML/Decision Trees & Random Forests

    Lesson 6 • Intermediate

    Decision Trees & Random Forests

    Build interpretable tree models and combine them into powerful random forest ensembles.

    ✅ What You'll Learn

    • • How decision trees split data using information gain
    • • Random Forests: ensemble learning with bootstrap aggregation
    • • Overfitting: how to detect and prevent it
    • • When to use trees vs linear models

    🌳 How Decision Trees Work

    🎯 Real-World Analogy: A decision tree works like a game of 20 Questions. "Is the animal bigger than a cat?" → Yes → "Does it live in water?" → No → "Does it have stripes?" → Yes → "It's a tiger!" Each question splits the possibilities until you reach an answer.

    The algorithm picks the best question at each step — the one that reduces uncertainty the most. This is measured by Information Gain (how much entropy decreases after the split).

    Try It: Decision Tree with Information Gain

    Build a decision tree for the classic 'play tennis' problem

    Try it Yourself »
    Python
    import numpy as np
    
    # Decision Tree: A flowchart that makes predictions
    # "Is it raining? → Yes → Take umbrella. No → Wear sunglasses."
    
    # Dataset: Should we play tennis?
    # Features: [outlook, temperature, humidity, wind]
    data = [
        {"outlook": "sunny",    "temp": "hot",  "humidity": "high",   "wind": "weak",   "play": "no"},
        {"outlook": "sunny",    "temp": "hot",  "humidity": "high",   "wind": "strong", "play": "no"},
        {"outlook": "overcast", "temp": "hot",  "humidity": "high",   "wind
    ...

    Try It: Random Forest Ensemble

    Combine multiple trees with bootstrap sampling and feature randomisation

    Try it Yourself »
    Python
    import numpy as np
    
    # Random Forest: Many trees vote together
    # "The wisdom of crowds" — ensemble learning
    
    np.random.seed(42)
    
    # Simulate 5 decision trees, each seeing different data
    n_trees = 5
    n_samples = 20
    
    # Generate dataset
    X = np.random.randn(n_samples, 3)  # 20 samples, 3 features
    y = (X[:, 0] + X[:, 1] - X[:, 2] > 0).astype(int)
    
    print("=== Random Forest Simulation ===")
    print(f"Dataset: {n_samples} samples, {X.shape[1]} features")
    print(f"Number of trees: {n_trees}")
    print()
    
    # Each t
    ...

    Try It: Overfitting vs Underfitting

    See how model complexity affects training vs test performance

    Try it Yourself »
    Python
    import numpy as np
    
    # Overfitting: When your model memorises instead of learning
    # The #1 problem in machine learning
    
    np.random.seed(42)
    
    # True pattern: y = 2x + noise
    X_train = np.linspace(0, 10, 15)
    y_train = 2 * X_train + np.random.randn(15) * 2
    
    X_test = np.linspace(0, 10, 5)
    y_test = 2 * X_test + np.random.randn(5) * 2
    
    # Model 1: Simple (degree 1) - might underfit
    coeffs_1 = np.polyfit(X_train, y_train, 1)
    pred_train_1 = np.polyval(coeffs_1, X_train)
    pred_test_1 = np.polyval(coeffs_1, X_
    ...

    📋 Quick Reference

    ConceptDescriptionKey Point
    Decision TreeFlowchart of if/else splitsEasy to interpret
    EntropyMeasure of disorder0 = pure, 1 = max mixed
    Info GainEntropy reductionHigher = better split
    Random ForestEnsemble of treesMore robust, less overfitting
    BootstrapSample with replacementEach tree sees different data

    🎉 Lesson Complete!

    You've mastered tree-based models! Next, dive into neural networks — the foundation of deep learning.

    Sign up for free to track which lessons you've completed and get learning reminders.

    Previous

    Cookie & Privacy Settings

    We use cookies to improve your experience, analyze traffic, and show personalized ads. You can manage your preferences below.

    By clicking "Accept All", you consent to our use of cookies for analytics and personalized advertising. You can customize your preferences or reject non-essential cookies.

    Privacy PolicyTerms of Service