Lesson 35 • Advanced
Advanced NLP: BERT, T5 & LLaMA
Fine-tune BERT for classification, T5 for text generation, and understand when to use encoder vs decoder models for different NLP tasks.
✅ What You'll Learn
- • BERT: masked language modeling and bidirectional understanding
- • T5: framing every NLP task as text-to-text
- • Encoder vs decoder vs encoder-decoder architectures
- • Choosing the right model for each NLP task
📝 Understanding vs Generating Text
🎯 Real-World Analogy: BERT is like a detective who reads the entire crime scene report (both directions) to understand what happened. GPT is like a storyteller who writes one word at a time, only looking at what they've written so far. T5 is like a translator who reads the entire source text first, then writes the translation. Each excels at different tasks because of how they process text.
The key architectural decision in NLP is whether your model should understand (encoder/BERT), generate (decoder/GPT), or transform (encoder-decoder/T5) text. This determines which tasks it excels at.
Try It: BERT & Masked Language Modeling
See how BERT learns language by predicting masked words
import numpy as np
# BERT: Bidirectional Encoder Representations from Transformers
# Pre-train by masking tokens, then fine-tune for any NLP task
np.random.seed(42)
def softmax(x):
e = np.exp(x - np.max(x))
return e / e.sum()
def simulate_mlm(sentence, mask_rate=0.15):
"""Simulate BERT's Masked Language Modeling"""
words = sentence.split()
masked = words.copy()
targets = []
for i in range(len(words)):
if np.random.rand() < mask_rate:
targe
...Try It: T5 Text-to-Text
Frame any NLP task as text input → text output
import numpy as np
# T5: Text-to-Text Transfer Transformer
# Every NLP task is framed as text → text
np.random.seed(42)
print("=== T5: Everything is Text-to-Text ===")
print()
print("T5's key insight: frame EVERY task as 'input text → output text'")
print()
tasks = [
("Translation",
"translate English to French: The house is wonderful.",
"La maison est merveilleuse."),
("Summarization",
"summarize: State authorities dispatched emergency relief to flood victims...",
...⚠️ Common Mistake: Using GPT for classification tasks. BERT is 10-100× cheaper and faster for classification, NER, and similarity tasks. GPT's strength is generation. Use the simplest model that solves your task — don't use a cannon to kill a mosquito.
💡 Pro Tip: Use Hugging Face's pipeline() function for instant NLP — it's one line of code for sentiment analysis, NER, summarization, and more. For fine-tuning, use the Trainer API with LoRA for parameter-efficient training on your custom data.
📋 Quick Reference
| Task | Best Model | Why |
|---|---|---|
| Classification | BERT | Bidirectional context |
| NER | BERT / SpaCy | Token-level understanding |
| Summarization | T5 / BART | Encoder-decoder structure |
| Chat / QA | GPT / LLaMA | Autoregressive generation |
| Translation | T5 / mBART | Cross-lingual transfer |
🎉 Lesson Complete!
You now understand the NLP model landscape! Next, learn how to build RAG systems that combine LLMs with knowledge retrieval.
Sign up for free to track which lessons you've completed and get learning reminders.