Lesson 2 • Beginner
Python for Machine Learning
Master NumPy, Pandas, and Matplotlib — the essential Python toolkit every ML engineer uses daily.
✅ What You'll Learn
- • NumPy arrays, vectorised operations, and matrix math
- • Pandas DataFrames for data manipulation and analysis
- • Data visualisation concepts with Matplotlib/Seaborn
- • The scikit-learn 5-step ML workflow
🛠️ The Python ML Stack
🎯 Real-World Analogy: If ML is cooking, then NumPy is your knife (fast precision operations), Pandas is your prep station (organise ingredients), Matplotlib is your food photography (present results), and Scikit-learn is your recipe book (proven algorithms).
📐 NumPy
Fast arrays and matrix math. Foundation of everything.
📊 Pandas
DataFrames for structured data. Load, clean, filter, aggregate.
📈 Matplotlib
Charts and plots. Visualise data before and after modelling.
🧪 Scikit-learn
ML algorithms. Train, predict, evaluate in 3 lines.
Try It: NumPy Fundamentals
Learn arrays, vectorised math, and matrix operations
import numpy as np
# NumPy: The foundation of all ML in Python
# Arrays are 50x faster than Python lists for math
# Create arrays
arr = np.array([1, 2, 3, 4, 5])
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("1D Array:", arr)
print("2D Matrix shape:", matrix.shape)
print()
# Vectorised operations (no loops needed!)
print("arr * 2 =", arr * 2)
print("arr + 10 =", arr + 10)
print("arr ** 2 =", arr ** 2)
print()
# Statistical operations (essential for ML)
data = np.array([23, 45,
...Try It: Pandas Data Manipulation
Create DataFrames, filter, group, and handle missing data
import pandas as pd
import numpy as np
# Pandas: Work with structured data like a pro
# Create a dataset
data = {
'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'age': [25, 30, 35, 28, 42],
'salary': [50000, 60000, 75000, 55000, 90000],
'department': ['Engineering', 'Marketing', 'Engineering', 'Marketing', 'Engineering']
}
df = pd.DataFrame(data)
print("=== Full Dataset ===")
print(df)
print()
# Quick stats (first thing you do with any dataset)
print("=== Statistical Su
...Try It: Data Visualisation Concepts
Understand scatter plots, histograms, and correlation
import numpy as np
# Data visualisation concepts for ML
# (Matplotlib/Seaborn run locally — here we simulate the logic)
# In ML, you visualise data to understand patterns BEFORE modelling
# Example: Generating data for a scatter plot
np.random.seed(42)
study_hours = np.random.uniform(1, 10, 20)
exam_scores = 5 * study_hours + np.random.randn(20) * 5 + 30
print("=== Scatter Plot Data (Study Hours → Exam Score) ===")
print(f"{'Hours':>8} {'Score':>8}")
print("-" * 18)
for h, s in sorted(zip(st
...Try It: The 5-Step ML Pipeline
Prepare data, split, train, predict, and evaluate a model
import numpy as np
# Scikit-learn workflow: the 5-step ML pipeline
# (Simulated here — install scikit-learn to run for real)
# Step 1: Prepare data
np.random.seed(42)
X = np.random.rand(100, 1) * 10 # 100 samples, 1 feature
y = 2.5 * X.flatten() + np.random.randn(100) * 2 + 5 # y = 2.5x + 5 + noise
print("Step 1: Data prepared")
print(f" Features shape: {X.shape}")
print(f" Target shape: {y.shape}")
# Step 2: Split into training and test sets
split = 80
X_train, X_test = X[:split], X[spl
...📋 Quick Reference
| Library | Purpose | Key Functions |
|---|---|---|
| numpy | Array math | array, dot, mean, std, reshape |
| pandas | Data manipulation | DataFrame, groupby, merge, fillna |
| matplotlib | Visualisation | plot, scatter, hist, show |
| sklearn | ML algorithms | fit, predict, score, train_test_split |
💡 Pro Tip: Use Google Colab for free — it comes with NumPy, Pandas, Scikit-learn, and even GPU access pre-installed. No setup needed. Just open colab.research.google.com and start coding.
🎉 Lesson Complete!
You now know the essential Python ML stack! Next, learn how to clean and prepare data for machine learning.
Sign up for free to track which lessons you've completed and get learning reminders.