Courses/AI & ML/Neural Networks Introduction

Lesson 7 • Intermediate

Neural Networks Introduction

By the end of this lesson you'll be able to compute a single neuron's output by hand, write ReLU and sigmoid in plain Python, and explain how layers learn by adjusting weights.

What You'll Learn in This Lesson

✓You'll be able to describe a neuron as weighted sum + bias + activation
✓You'll be able to compute a neuron's output in plain Python with lists
✓You'll be able to write the ReLU and sigmoid activation functions yourself
✓You'll be able to explain what a layer is and how layers stack
✓You'll be able to trace a forward pass from inputs to prediction
✓You'll be able to explain how learning nudges weights to cut error

Before you start: You should be comfortable with Python lists, loops, and functions. If decision trees are still fuzzy, revisit Lesson 6: Decision Trees first. All runnable exercises here use plain Python — no numpy required.

🧠 Real-World Analogy: Brain Neurons

Your brain has billions of neurons. Each one receives little electrical signals from its neighbours through connections called synapses. Some connections are strong and some are weak — they decide how much each incoming signal counts. When the combined signal crosses a threshold, the neuron fires and passes a signal on to the next neurons.

An artificial neuron copies this idea with arithmetic. The "synapse strengths" become numbers called weights, the firing threshold becomes a bias, and the "fire or not" decision becomes an activation function. Learning is just gradually turning the strength of each connection up or down until the whole network responds the way you want.

1The Neuron — Weighted Sum + Bias + Activation

A neuron (also called a perceptron when it's on its own) is the smallest building block of a neural network. It takes some inputs and produces a single number. It does this in three tiny steps:

Weighted sum — multiply each input by its weight and add the results together.
Add a bias — add one extra number that shifts the total up or down.
Activation — pass the total through a function that decides the output.

z = (x1·w1 + x2·w2 + … + xn·wn) + bias

output = activation(z)

The weights and bias are the neuron's knowledge. Everything a network learns ends up stored as weights and biases. Here is a single neuron written in plain Python — read each comment, then run it.

Worked Example: One Neuron in Plain Python

Compute weighted sum + bias + a step activation by hand

Try it Yourself »

Python

# A single neuron (a "perceptron") — plain Python, no libraries
# A neuron does 3 tiny steps:
#   1. weighted sum:  multiply each input by its weight and add them up
#   2. add a bias:    a number that shifts the result up or down
#   3. activation:    squash the result through a function (here: a step)

# The neuron's "knowledge" lives in these numbers.
inputs  = [1.0, 0.0, 1.0]      # 3 features going in (e.g. yes/no signals)
weights = [0.7, 0.3, 0.9]      # how much each input matters
bias   
...

2Activation Functions — ReLU and Sigmoid

The activation function is what makes a network powerful. It introduces non-linearity — a fancy way of saying "the output can bend and curve instead of being one straight line." Two activations cover almost everything a beginner needs:

ReLU (Rectified Linear Unit): returns the input if it's positive, otherwise 0. It's fast and is the default choice for hidden layers.
Sigmoid: squashes any number into the range 0 to 1 using an S-shaped curve. Perfect for an output that represents a probability ("how likely is this a cat?").

Both are just a couple of lines of plain Python — the only thing you need from the standard library is math.exp for sigmoid.

Worked Example: ReLU and Sigmoid

See how each activation reshapes the same inputs

Try it Yourself »

Python

import math   # only the standard library — no numpy

# Activation functions decide what a neuron "fires".
# Without them, stacking neurons would just be one big straight line.

def relu(x):
    # ReLU = "Rectified Linear Unit": keep positives, zero out negatives
    return x if x > 0 else 0.0

def sigmoid(x):
    # Sigmoid squashes ANY number into the range 0..1 (an S-curve)
    return 1 / (1 + math.exp(-x))

# Try them on a few numbers so you can see the shape
for x in [-2.0, -0.5, 0.0, 0.5, 2
...

🎯 Your Turn: Finish the Neuron's Forward Pass

Fill in the blanks to compute z and the neuron's output

Try it Yourself »

Python

# 🎯 YOUR TURN — finish the neuron's forward pass
# Fill in each ___ . Expected output is at the bottom so you can self-check.

inputs  = [2.0, 3.0]
weights = [0.5, -1.0]
bias    = 1.0

# 1) Start z at the bias value
z = ___                       # 👉 replace ___ with the bias variable

# 2) Add input * weight for each pair
for x, w in zip(inputs, weights):
    z = z + ___               # 👉 replace ___ with  x * w

# 3) Apply a step activation: 1 if z is positive, else 0
output = 1 if z ___ 0 e
...

🎯 Your Turn: Implement ReLU and Sigmoid

Write the two activation functions from scratch

Try it Yourself »

Python

import math

# 🎯 YOUR TURN — implement the two activations from scratch.
# Fill in the ___ blanks, then run to compare against the expected output.

def relu(x):
    # Return x when it is positive, otherwise return 0.0
    return ___ if x > 0 else 0.0     # 👉 replace ___ with  x

def sigmoid(x):
    # The S-curve: 1 / (1 + e^(-x)).  math.exp(n) computes e^n.
    return 1 / (1 + math.exp(___))   # 👉 replace ___ with  -x

print("relu(-3)   =", relu(-3))
print("relu(4)    =", relu(4))
print("sig
...

3Layers and the Forward Pass

One neuron can only draw a single straight boundary, which is too weak for most real problems. The fix is to use many neurons arranged in layers:

Input layer — your raw features (e.g. pixel values, sensor readings).
Hidden layer(s) — neurons that each look for a different pattern in the inputs.
Output layer — produces the final prediction.

The forward pass is simply running data through the network from left to right: every neuron in a layer computes weighted sum + bias + activation, and its outputs become the inputs to the next layer. Stack enough layers and the network can approximate almost any function — that's the whole magic.

inputs → [hidden layer: many neurons] → [output layer] → prediction

Why hidden layers matter: a problem like XOR (output 1 only when the two inputs differ) cannot be solved by a single neuron. Add one hidden layer and it becomes easy — each hidden neuron learns a piece of the pattern, and the output neuron combines them.

4How Learning Adjusts Weights

A fresh network starts with random weights, so its first predictions are basically guesses. Training fixes that with a repeating loop:

Forward pass: run an example through the network to get a prediction.
Measure error: compare the prediction to the correct answer (the "loss").
Backpropagation: work out how much each weight contributed to the error.
Update: nudge every weight and bias a little in the direction that lowers the error.

That nudge size is controlled by the learning rate. Repeat this loop over thousands of examples and the weights slowly settle into values that make good predictions. You don't have to compute the gradients by hand — libraries like TensorFlow and PyTorch do it for you. Here's the same neuron idea written the professional way, plus a tiny Keras network, shown as a read-only reference.

Worked Example: numpy & Keras Version (reference)

The same maths the way professionals write it — read it, the expected output is in comments

Try it Yourself »

Python

# The SAME idea, written the way professionals do it.
# numpy does the weighted sum for a whole layer in one line.
import numpy as np

inputs  = np.array([1.0, 0.0, 1.0])
weights = np.array([0.7, 0.3, 0.9])
bias    = -1.0

z = np.dot(inputs, weights) + bias   # weighted sum + bias, vectorised
output = 1 / (1 + np.exp(-z))        # sigmoid activation
print(round(float(output), 3))       # 0.646

# Expected output:
# 0.646


# A whole network in a few lines with Keras (TensorFlow).
# This builds 2
...

5Common Errors (And How to Fix Them)

These four mistakes trip up nearly everyone who builds their first network.

❌ No non-linearity = a glorified linear model

If every layer uses no activation (or only a linear one), stacking layers collapses into a single straight line. The network can't learn curves and will fail on problems like XOR.

✅ Fix: put a non-linear activation (ReLU) on every hidden layer:

# ❌ hidden = weighted_sum            # linear — no power
# ✅ hidden = relu(weighted_sum)       # adds the curve the network needs

❌ Vanishing gradients

Sigmoid and tanh flatten out for large inputs, so their slope (gradient) becomes almost 0. In deep networks the update signal shrinks to nothing and early layers stop learning.

✅ Fix: use ReLU in hidden layers; keep sigmoid for the final output only.

# hidden layers -> relu(z)            # gradient stays healthy
# output layer  -> sigmoid(z)         # 0..1 probability is fine here

❌ Not normalizing inputs

Feeding raw values on wildly different scales (e.g. age 0–100 next to salary 0–100000) makes training unstable — the big numbers dominate the weighted sum.

✅ Fix: scale features to a similar range before training.

# scale each feature to roughly 0..1
normalized = [(v - low) / (high - low) for v in feature]

❌ Learning rate too big

If each weight update is too large, the network overshoots the good values and the loss bounces around or explodes to nan instead of going down.

✅ Fix: start small (e.g. 0.01) and only increase if learning is too slow.

# learning_rate = 10.0    # ❌ loss jumps around / becomes nan
learning_rate = 0.01      # ✅ steady, reliable improvement

📋 Quick Reference

Term	What it is	In code / formula
Weight	How much an input matters	`x * w`
Bias	Shifts the sum up/down	`z = ... + bias`
Weighted sum (z)	Inputs·weights + bias	`sum(x*w) + bias`
ReLU	Keep positives, zero negatives	`x if x > 0 else 0`
Sigmoid	Squash into 0..1	`1/(1+math.exp(-x))`
Layer	A group of neurons	input / hidden / output
Forward pass	Inputs → prediction	layer by layer
Learning rate	Size of each weight nudge	`w += lr * ...`

❓ Frequently Asked Questions

Q: What is a neuron (perceptron) in a neural network?

A: It is a tiny function that multiplies each input by a weight, adds those products together, adds a bias number, and passes the result through an activation function. The output is the neuron's decision.

Q: What is the bias for?

A: The bias shifts the weighted sum up or down before activation, so the neuron can fire at a different threshold. Without it, every neuron would be forced to pass through zero, which limits what the network can learn.

Q: Why do neural networks need activation functions?

A: Activation functions add non-linearity. If you remove them, stacking layers collapses into a single straight-line (linear) model that can only solve linearly separable problems. ReLU and sigmoid let the network bend and curve to fit complex data.

Q: What is the difference between ReLU and sigmoid?

A: ReLU returns the input if it is positive and 0 otherwise — fast and the default for hidden layers. Sigmoid squashes any number into 0..1, which is ideal for an output that represents a probability, but it causes vanishing gradients in deep hidden layers.

Q: What is a forward pass?

A: The forward pass is running inputs through the network layer by layer to produce a prediction: for each neuron you compute weighted sum + bias, apply the activation, then feed the results into the next layer.

Q: How does a neural network learn?

A: It compares its prediction to the correct answer to measure error, then nudges every weight and bias a little in the direction that reduces that error. Repeating this over many examples (using backpropagation and gradient descent) is what we call training.

🎯 Mini-Challenge: A Neuron with a Real Activation

Time to fly with less support. Build a 2-input neuron that ends with a sigmoid activation. Only a comment outline is given — fill in the logic yourself, then check against the expected output in the comments.

Mini-Challenge

Write the whole neuron from the brief — no scaffolding

Try it Yourself »

Python

import math

# 🎯 MINI-CHALLENGE: a 2-input neuron with a real activation
# Brief:
#   1. Define sigmoid(x)  ->  1 / (1 + math.exp(-x))
#   2. inputs  = [1.0, 1.0]   weights = [2.0, 2.0]   bias = -3.0
#   3. Compute z = bias + sum of input*weight   (use a loop or zip)
#   4. Pass z through sigmoid to get the output (a probability 0..1)
#   5. print("z =", z)  and  print("output =", round(output, 3))
#
# ✅ Expected (inputs 1,1):
# z = 1.0
# output = 0.731

# your code here

🎉

Lesson complete — you understand how neurons think!

You can now compute a neuron as weighted sum + bias + activation, write ReLU and sigmoid in plain Python, describe how layers stack into a forward pass, and explain how training nudges weights to shrink error. These are the exact foundations every deep learning model is built on.

🚀 Up next: Deep Learning Fundamentals — stack many layers, train with backpropagation, and tackle real datasets.

Neural Networks Introduction

What You'll Learn in This Lesson

🧠 Real-World Analogy: Brain Neurons

1The Neuron — Weighted Sum + Bias + Activation

Worked Example: One Neuron in Plain Python

2Activation Functions — ReLU and Sigmoid

Worked Example: ReLU and Sigmoid

🎯 Your Turn: Finish the Neuron's Forward Pass

🎯 Your Turn: Implement ReLU and Sigmoid

3Layers and the Forward Pass

4How Learning Adjusts Weights

Worked Example: numpy & Keras Version (reference)

5Common Errors (And How to Fix Them)

📋 Quick Reference

❓ Frequently Asked Questions

🎯 Mini-Challenge: A Neuron with a Real Activation

Mini-Challenge

Lesson complete — you understand how neurons think!

Cookie & Privacy Settings