Lesson 21 • Advanced
The Java Stream API
Stop writing loops to filter, transform, and summarise data. Learn to build readable stream pipelines that say what you want, not how to loop for it.
Before You Start
You should be comfortable with Collections (List, Map) and the basics of lambda expressions — the short x -> x * 2 functions you pass to stream operations. A stream is built on top of a collection, so if you can create a List<String> you're ready.
What You'll Learn in This Lesson
- ✓Create streams with .stream(), Stream.of, and IntStream.range
- ✓Chain intermediate ops: filter, map, sorted, distinct, limit
- ✓Finish with terminal ops: collect, forEach, reduce, count, toList
- ✓Use Collectors: toList, toMap, groupingBy, joining
- ✓Understand lazy evaluation — why nothing runs until the end
- ✓Use parallel streams safely (and know when not to)
🏭 Real-World Analogy: A Factory Assembly Line
A stream pipeline works exactly like a factory assembly line. Raw materials (your collection) enter at one end. They pass through a series of stations — one keeps only the good parts (filter), one reshapes each part (map), one puts them in order (sorted). At the very end, a worker boxes up the finished goods (collect).
💡 The key insight: the conveyor belt does not move until someone at the end asks for a finished product. The middle stations (intermediate operations) are just instructions written on a clipboard. Only the final terminal operation switches the belt on. That is lazy evaluation, and it is what makes streams efficient.
1️⃣ Creating a Stream
A stream is a one-shot sequence of elements you push through a pipeline. You don't store data in a stream — you flow data through one. There are three common ways to start a stream:
collection.stream()— the usual starting point, from anyList<T>,Set, etc.Stream.of(a, b, c)— when you have loose values, not a collection.IntStream.range(0, n)— a stream of ints, replacing index-basedforloops. The end is exclusive; userangeClosedto include it.
.stream() on the original collection a second time.2️⃣ Intermediate Operations (the Recipe)
Intermediate operations each take a stream and return a new stream, so you can chain them. They are lazy — calling them just records a step; no data moves yet. The ones you'll reach for constantly:
| Operation | What it does | Example |
|---|---|---|
| filter | Keep elements matching a test | .filter(n -> n > 10) |
| map | Transform each element | .map(String::toUpperCase) |
| sorted | Order the elements | .sorted() |
| distinct | Drop duplicates | .distinct() |
| limit | Keep only the first N | .limit(3) |
import java.util.List;
import java.util.stream.IntStream;
import java.util.stream.Stream;
public class Main {
public static void main(String[] args) {
// ── Three ways to CREATE a stream ──────────────────────────────
List<String> names = List.of("Alice", "Bob", "Charlie", "Bob");
// 1) From a collection: .stream()
names.stream().forEach(n -> System.out.println("From list: " + n));
// 2) From values: Stream.of(...)
Stream.of("red", "green", "blue")
.forEach(c -> System.out.println("Stream.of: " + c));
// 3) From a range of ints: IntStream.range(start, end) (end is EXCLUSIVE)
IntStream.range(1, 4) // 1, 2, 3 (NOT 4)
.forEach(i -> System.out.println("range: " + i));
// ── INTERMEDIATE ops build a recipe (lazy, nothing runs yet) ───
// ── The TERMINAL op (collect) runs the whole pipeline ──────────
List<String> result = names.stream()
.filter(n -> n.length() > 3) // keep names longer than 3 chars ("Bob" is out)
.distinct() // drop any duplicate values
.map(String::toUpperCase) // transform each surviving element
.sorted() // order alphabetically
.toList(); // TERMINAL: collect into a List (Java 16+)
System.out.println("Pipeline result: " + result);
}
}From list: Alice
From list: Bob
From list: Charlie
From list: Bob
Stream.of: red
Stream.of: green
Stream.of: blue
range: 1
range: 2
range: 3
Pipeline result: [ALICE, CHARLIE]3️⃣ Terminal Operations & Collectors (the Finished Product)
A terminal operation is what finally runs the pipeline and produces a result. After it runs, the stream is consumed. The common ones:
collect(...)— gather elements into a collection, driven by a Collector.toList()— a shortcut (Java 16+) for collecting into an unmodifiableList.forEach(...)— perform a side effect (like printing) on each element.reduce(...)— fold every element into a single value (a sum, a max).count()— how many elements made it through.
The real power lives in the Collectors class, which you pass to collect:
Collectors.toList()— gather into aList.Collectors.toMap(keyFn, valueFn)— build aMaplookup.Collectors.groupingBy(classifier)— bucket elements by a key, like SQLGROUP BY.Collectors.joining(", ")— concatenate strings with a separator.
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
import java.util.stream.Collectors;
public class Main {
record Product(String name, String category, double price) {}
public static void main(String[] args) {
List<Product> products = List.of(
new Product("Apple", "Fruit", 0.50),
new Product("Banana", "Fruit", 0.30),
new Product("Carrot", "Veg", 0.20),
new Product("Milk", "Dairy", 1.20),
new Product("Cheese", "Dairy", 3.50));
// count() — how many items survive the filter?
long cheapCount = products.stream()
.filter(p -> p.price() < 1.00)
.count();
System.out.println("Items under $1: " + cheapCount);
// reduce() — fold every price into one running total
double total = products.stream()
.map(Product::price)
.reduce(0.0, Double::sum); // start at 0.0, keep adding
System.out.printf("Total price: $%.2f%n", total);
// Collectors.toList() — gather names into a List
List<String> nameList = products.stream()
.map(Product::name)
.collect(Collectors.toList());
System.out.println("Names: " + nameList);
// Collectors.joining() — concatenate with a delimiter
String joined = products.stream()
.map(Product::name)
.collect(Collectors.joining(", ", "[", "]"));
System.out.println("Joined: " + joined);
// Collectors.toMap() — build a name -> price lookup
Map<String, Double> priceOf = products.stream()
.collect(Collectors.toMap(Product::name, Product::price));
System.out.println("Milk costs: $" + priceOf.get("Milk"));
// Collectors.groupingBy() — like SQL GROUP BY
Map<String, List<String>> byCategory = products.stream()
.collect(Collectors.groupingBy(Product::category,
Collectors.mapping(Product::name, Collectors.toList())));
new TreeMap<>(byCategory).forEach((cat, items) ->
System.out.println(cat + ": " + String.join(", ", items)));
}
}Items under $1: 3
Total price: $5.70
Names: [Apple, Banana, Carrot, Milk, Cheese]
Joined: [Apple, Banana, Carrot, Milk, Cheese]
Milk costs: $1.2
Dairy: Milk, Cheese
Fruit: Apple, Banana
Veg: Carrot🎯 Your Turn #1: Filter Big Numbers
Finish the pipeline below. Fill in the filter predicate so only numbers greater than 10 survive. The expected output is in the comment — run it on your machine and check.
import java.util.List;
public class Main {
public static void main(String[] args) {
// 🎯 YOUR TURN — fill in the blanks marked with ___
List<Integer> numbers = List.of(5, 12, 8, 21, 3, 17, 9);
// 1) Keep only the numbers greater than 10
// 👉 replace ___ with a filter predicate
List<Integer> big = numbers.stream()
.filter(n -> ___)
.toList();
// 2) Print the kept numbers
System.out.println("Big numbers: " + big);
// ✅ Expected output:
// Big numbers: [12, 21, 17]
}
}🎯 Your Turn #2: Map Words to Lengths
This time fill in two blanks: the intermediate operation that transforms each element, and the expression that gives a word's length. Compare with the expected output in the comment.
import java.util.List;
import java.util.stream.Collectors;
public class Main {
public static void main(String[] args) {
// 🎯 YOUR TURN — fill in the blanks marked with ___
List<String> words = List.of("sky", "ocean", "sun", "mountain");
// 1) Transform each word to its length, then collect into a List
// 👉 replace the first ___ with the op that transforms each element
// 👉 replace the second ___ with the length of the string (use w.length())
List<Integer> lengths = words.stream()
.___(w -> ___)
.collect(Collectors.toList());
System.out.println("Lengths: " + lengths);
// ✅ Expected output:
// Lengths: [3, 5, 3, 8]
}
}4️⃣ Lazy Evaluation & Parallel Streams
Lazy evaluation means the pipeline does nothing until a terminal operation pulls data through. Intermediate operations are fused into a single pass, so .filter().map().filter() does not create three temporary lists — each element flows through the whole chain at once. Short-circuiting operations like limit and findFirst can stop early, so the source is never fully processed.
A parallel stream (.parallel() or collection.parallelStream()) splits the work across CPU cores using the shared ForkJoinPool. It can be dramatically faster — but only sometimes.
import java.util.List;
import java.util.stream.IntStream;
public class Main {
public static void main(String[] args) {
// ── LAZY EVALUATION proof ──────────────────────────────────────
// map() prints a tag as each element passes. Because limit(2) is a
// short-circuiting terminal-feeder, the pipeline stops after 2 — it
// does NOT process all five. Nothing runs until findFirst() pulls.
System.out.println("Lazy demo:");
java.util.Optional<Integer> first = List.of(1, 2, 3, 4, 5).stream()
.map(n -> {
System.out.println(" mapping " + n);
return n * 10;
})
.filter(n -> n > 15)
.findFirst(); // pulls only until the first match
System.out.println("First match: " + first.get());
// ── PARALLEL stream ────────────────────────────────────────────
// Splits work across CPU cores via the ForkJoinPool. Only worth it
// for LARGE datasets and CPU-heavy, stateless work. The RESULT of a
// reduction is deterministic; the processing ORDER is not.
long sum = IntStream.rangeClosed(1, 1_000_000)
.parallel() // opt in to parallelism
.filter(n -> n % 2 == 0) // even numbers only
.mapToLong(n -> (long) n)
.sum(); // associative reduction — safe in parallel
System.out.println("Sum of evens 1..1,000,000: " + sum);
}
}Lazy demo:
mapping 1
mapping 2
First match: 20
Sum of evens 1..1,000,000: 250000500000Mini-Challenge: Big Spenders
Time to fly solo. The starter below has only a comment outline — no filled-in logic. Build a single pipeline that finds the customers who spent $100 or more, sorted alphabetically. The expected output is in the comments.
import java.util.List;
public class Main {
record Order(String customer, double amount) {}
public static void main(String[] args) {
// 🎯 MINI-CHALLENGE: Big spenders
List<Order> orders = List.of(
new Order("Ana", 120.0),
new Order("Ben", 45.0),
new Order("Cara", 200.0),
new Order("Dan", 80.0));
// 1. Build a stream from "orders"
// 2. Keep only orders with amount >= 100
// 3. Map each surviving order to its customer name
// 4. Sort the names alphabetically
// 5. Collect into a List and print it
//
// ✅ Expected output:
// [Ana, Cara]
// your code here
}
}Common Errors (and How to Fix Them)
- ❌ Reusing a consumed stream:
IllegalStateException: stream has already been operated upon or closed. A stream is one-shot. Don't store it in a variable and run two terminal ops on it — calllist.stream()again to get a fresh one. - ❌ Side effects inside map(): printing or mutating an external list from
map(x -> { list.add(x); return x; })looks fine sequentially but corrupts data underparallel(). Keepmappure; do side effects inforEach. - ❌ Forgetting the terminal operation:
list.stream().filter(...).map(...);compiles and runs but does nothing — intermediate ops are lazy. Nothing happens until you add.collect(...),.forEach(...), or.toList(). - ❌ Parallel misuse: calling
.parallel()on a 10-item list, or reducing into a sharedArrayList, is slower or outright buggy. Use parallel only for large, stateless, CPU-bound work with an associative reduction. - ❌ Duplicate keys in toMap():
IllegalStateException: Duplicate keywhen two elements map to the same key. Pass a merge function:Collectors.toMap(k, v, (a, b) -> a).
📋 Quick Reference
| Goal | Code | Type |
|---|---|---|
| Stream from a list | list.stream() | Source |
| Stream from values | Stream.of(a, b, c) | Source |
| Range of ints | IntStream.range(0, n) | Source |
| Keep matching | .filter(pred) | Intermediate |
| Transform each | .map(fn) | Intermediate |
| Order / dedupe / cap | .sorted() .distinct() .limit(n) | Intermediate |
| Collect to list | .toList() | Terminal |
| Group by key | .collect(groupingBy(fn)) | Terminal |
| Build a map | .collect(toMap(k, v)) | Terminal |
| Join strings | .collect(joining(", ")) | Terminal |
| Fold to one value | .reduce(0, Integer::sum) | Terminal |
| Run in parallel | .parallel() | Modifier |
❓ Frequently Asked Questions
What is the difference between an intermediate and a terminal operation?
Intermediate operations (filter, map, sorted, distinct, limit) return a new stream and are lazy — they only describe the pipeline. A terminal operation (collect, forEach, reduce, count, toList) actually runs the pipeline and produces a result or side effect. A stream does nothing until a terminal operation is called.
Why can't I reuse a stream after calling a terminal operation?
A stream is a one-shot pipeline, not a collection. Once a terminal operation consumes it, the stream is closed and calling another operation throws IllegalStateException: stream has already been operated upon or closed. If you need to process the data again, create a fresh stream from the original collection.
Should I use collect(Collectors.toList()) or the newer toList()?
Both gather a stream into a List. toList() (Java 16+) is shorter and returns an unmodifiable list. collect(Collectors.toList()) returns a list with no guarantee about mutability or type. Use toList() for new code unless you specifically need a mutable list or an older Java version.
When are parallel streams actually faster?
Only when you have a large dataset (tens of thousands of elements or more) and stateless, CPU-bound work that splits cleanly. For small collections the cost of splitting and merging across threads outweighs the gain, so a sequential stream wins. Never use parallel streams with shared mutable state.
Can I put a print statement or other side effect inside map()?
You can, but you shouldn't. map() is for pure transformations — input in, transformed value out. Side effects belong in forEach() (a terminal op) or peek() (for debugging only). Side effects in map() break badly under parallel streams and make pipelines hard to reason about.
🎉 Lesson Complete!
You can now build complete stream pipelines — create a stream with .stream(), Stream.of, or IntStream.range, chain intermediate ops like filter and map, and finish with a terminal op such as collect, reduce, or toList. You also understand lazy evaluation and when parallel streams help.
Next: Lambda Expressions Deep Dive — functional interfaces, method references, and the lambdas that power every stream operation you just learned.
Sign up for free to track which lessons you've completed and get learning reminders.