Lesson 24 • Advanced
Collections Internals & Performance
Look under the hood of Java's collections. After this lesson you'll know exactly how ArrayList, LinkedList, HashMap, TreeMap, and HashSet store data — and you'll pick the right one by its Big-O instead of by guessing.
What You'll Learn in This Lesson
- ✓How ArrayList grows its backing array and why add() is amortized O(1)
- ✓How LinkedList's doubly-linked nodes make end operations O(1) but get() O(n)
- ✓How HashMap turns a key into a bucket using hashCode() and equals()
- ✓What load factor, resizing, and Java 8 treeify actually do
- ✓How TreeMap's red-black tree keeps keys sorted in O(log n)
- ✓Which collection to reach for, read straight off the Big-O table
Before You Start
You should already know how to use Collections and Generics. This lesson explains the machinery inside them, so you understand why each one is fast or slow — not just how to call add() and get().
🏬 A Real-World Analogy: Four Ways to Store Things
Every collection is just a different strategy for storing items, with different trade-offs:
- ArrayList is a row of numbered lockers. Go straight to locker 7 instantly — but to squeeze a new locker in at the front, every locker behind it has to shuffle down one space.
- LinkedList is a conga line of people holding hands. Anyone can join or leave at either end instantly, but to reach the 47th person you must count along the whole line.
- HashMap is a coat-check. You hand over a coat (a key), the attendant runs it through a formula to pick a hook number, and retrieval is near-instant — as long as the formula spreads coats across many hooks.
- TreeMap is a library shelf kept permanently in alphabetical order. Inserting takes a little work to keep the order, but you can always grab the first, last, or "next after K" book.
1️⃣ ArrayList — A Resizing Array
An ArrayList wraps a plain array (the backing array). Because the items sit in one contiguous block, jumping to get(i) is a single calculation — that's O(1) random access.
Arrays can't grow, so when the backing array fills up, ArrayList allocates a bigger one (about 1.5x the size), copies everything across, and throws the old one away. Each individual copy is O(n), but because each resize buys a lot of headroom, the copies are rare. Averaged over many add() calls the cost is constant — this is amortized O(1).
Inserting or removing in the middle is the slow case: every element after the gap must shift one slot, which is O(n). Read the resizing happen in the output below.
import java.util.Arrays;
public class Main {
// A teaching re-implementation of ArrayList's backing array + resize logic.
// The real java.util.ArrayList does exactly this, just with Object[] + generics.
static class SimpleArrayList {
private int[] data = new int[4]; // backing array (real default is 10)
private int size = 0; // how many slots are actually used
private void ensureCapacity() {
if (size == data.length) { // array is full
int oldCap = data.length;
int newCap = oldCap + (oldCap >> 1); // grow ~1.5x
data = Arrays.copyOf(data, newCap); // O(n) copy
System.out.println(" Resized: " + oldCap + " -> " + newCap);
}
}
void add(int item) {
ensureCapacity();
data[size++] = item; // amortized O(1) — copy is rare
}
void addAt(int index, int item) {
ensureCapacity();
// Shift every element right to make room — this is the O(n) part.
for (int i = size; i > index; i--) data[i] = data[i - 1];
data[index] = item;
size++;
}
int get(int index) {
return data[index]; // O(1) — jump straight to the slot
}
int size() { return size; }
int capacity() { return data.length; }
@Override public String toString() {
return Arrays.toString(Arrays.copyOf(data, size));
}
}
public static void main(String[] args) {
SimpleArrayList list = new SimpleArrayList();
System.out.println("ADDING 10 ITEMS (watch the array grow):");
for (int i = 1; i <= 10; i++) list.add(i * 10);
System.out.println(" Final: " + list);
System.out.println(" size=" + list.size() + " capacity=" + list.capacity());
System.out.println("\nGET is O(1) — direct index:");
System.out.println(" get(0) = " + list.get(0));
System.out.println(" get(7) = " + list.get(7));
System.out.println("\nINSERT at front is O(n) — shifts everything:");
list.addAt(0, 5);
System.out.println(" After addAt(0, 5): " + list);
}
}ADDING 10 ITEMS (watch the array grow):
Resized: 4 -> 6
Resized: 6 -> 9
Resized: 9 -> 13
Final: [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
size=10 capacity=13
GET is O(1) — direct index:
get(0) = 10
get(7) = 80
INSERT at front is O(n) — shifts everything:
After addAt(0, 5): [5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]2️⃣ LinkedList — A Chain of Nodes
A LinkedList stores no array at all. Each value lives in its own node, and every node holds a pointer to the previous and next node. Java's LinkedList is doubly-linked and keeps a reference to both the head and the tail.
That makes adding or removing at either end O(1) — you just re-wire a couple of pointers, nothing shifts. But there's no index math: to reach get(47) you must walk node by node from the nearest end, which is O(n). The output prints how many nodes it walks.
public class Main {
// A teaching doubly-linked list, like the structure inside java.util.LinkedList.
// Each node knows its value plus its previous and next neighbours.
static class SimpleLinkedList {
private static class Node {
String val;
Node prev, next;
Node(String val) { this.val = val; }
}
private Node head, tail;
private int size = 0;
void addLast(String val) { // O(1) — tail is tracked
Node node = new Node(val);
node.prev = tail;
if (tail != null) tail.next = node;
else head = node;
tail = node;
size++;
}
void addFirst(String val) { // O(1) — head is tracked
Node node = new Node(val);
node.next = head;
if (head != null) head.prev = node;
else tail = node;
head = node;
size++;
}
String removeFirst() { // O(1) — no shifting needed
if (head == null) return null;
String val = head.val;
head = head.next;
if (head != null) head.prev = null;
else tail = null;
size--;
return val;
}
String get(int index) { // O(n) — must walk the chain
Node curr = head;
int steps = 0;
for (int i = 0; i < index && curr != null; i++) {
curr = curr.next;
steps++;
}
System.out.println(" (walked " + steps + " nodes to reach index " + index + ")");
return curr == null ? null : curr.val;
}
int size() { return size; }
@Override public String toString() {
StringBuilder sb = new StringBuilder();
for (Node c = head; c != null; c = c.next) {
if (sb.length() > 0) sb.append(" <-> ");
sb.append(c.val);
}
return sb.toString();
}
}
public static void main(String[] args) {
SimpleLinkedList list = new SimpleLinkedList();
list.addLast("B");
list.addLast("C");
list.addFirst("A"); // cheap insert at the front!
System.out.println("List: " + list + " (size=" + list.size() + ")");
System.out.println("\nremoveFirst() returned: " + list.removeFirst());
System.out.println("List now: " + list);
System.out.println("\nget(1) — O(n) walk:");
System.out.println(" value = " + list.get(1));
}
}List: A <-> B <-> C (size=3)
removeFirst() returned: A
List now: B <-> C
get(1) — O(n) walk:
(walked 1 nodes to reach index 1)
value = C3️⃣ HashMap — Buckets, hashCode & equals
A HashMap is an array of buckets. To store a key it calls the key's hashCode(), squeezes that number into the bucket range, and drops the entry there. Lookup repeats the same calculation and scans only that one bucket — so on average put() and get() are O(1).
When two different keys land in the same bucket that's a collision, and the bucket holds a little list of entries. equals() then decides which entry is really your key. The load factor (size / capacity, default 0.75) controls growth: once the map passes 0.75 full it doubles the bucket array and rehashes everything to keep collisions low.
hashCode() hurts less than it used to.import java.util.*;
public class Main {
// A teaching HashMap: an array of buckets, each bucket a list of entries.
// hashCode() picks the bucket; equals() finds the exact key inside it.
static class SimpleHashMap {
record Entry(String key, String value) {}
private final List<List<Entry>> buckets = new ArrayList<>();
private final int capacity;
private int size = 0;
SimpleHashMap(int capacity) {
this.capacity = capacity;
for (int i = 0; i < capacity; i++) buckets.add(new ArrayList<>());
}
// Same idea as String.hashCode(): h = h*31 + char, then map into range.
private int bucketFor(String key) {
return (key.hashCode() & 0x7fffffff) % capacity; // strip sign, then mod
}
void put(String key, String value) {
List<Entry> bucket = buckets.get(bucketFor(key));
for (int i = 0; i < bucket.size(); i++) {
if (bucket.get(i).key().equals(key)) { // equals() decides duplicate
bucket.set(i, new Entry(key, value)); // replace existing
return;
}
}
bucket.add(new Entry(key, value)); // average O(1) insert
size++;
}
String get(String key) {
for (Entry e : buckets.get(bucketFor(key))) { // scan only one bucket
if (e.key().equals(key)) return e.value();
}
return null;
}
void printBuckets() {
for (int i = 0; i < capacity; i++) {
List<Entry> bucket = buckets.get(i);
if (bucket.isEmpty()) continue;
StringBuilder sb = new StringBuilder();
for (Entry e : bucket) {
if (sb.length() > 0) sb.append(", ");
sb.append(e.key()).append("=").append(e.value());
}
System.out.println(" Bucket[" + i + "]: " + sb
+ (bucket.size() > 1 ? " <- collision" : ""));
}
}
int size() { return size; }
int capacity() { return capacity; }
}
public static void main(String[] args) {
SimpleHashMap map = new SimpleHashMap(8);
map.put("name", "Alice");
map.put("city", "NYC");
map.put("job", "Developer");
map.put("team", "Backend");
map.put("name", "Bob"); // same key -> replaces, does NOT add
System.out.println("BUCKETS:");
map.printBuckets();
System.out.println("\nLOOKUPS (average O(1)):");
System.out.println(" get(\"name\") = " + map.get("name"));
System.out.println(" get(\"job\") = " + map.get("job"));
System.out.println(" get(\"missing\") = " + map.get("missing"));
double loadFactor = (double) map.size() / map.capacity();
System.out.printf("%nLOAD FACTOR = size/capacity = %d/%d = %.2f%n",
map.size(), map.capacity(), loadFactor);
System.out.println("Real HashMap resizes (doubles) once this passes 0.75.");
}
}BUCKETS:
Bucket[1]: city=NYC
Bucket[2]: job=Developer
Bucket[3]: team=Backend
Bucket[6]: name=Bob
LOOKUPS (average O(1)):
get("name") = Bob
get("job") = Developer
get("missing") = null
LOAD FACTOR = size/capacity = 4/8 = 0.50
Real HashMap resizes (doubles) once this passes 0.75.4️⃣ TreeMap (Red-Black Tree) & HashSet
A TreeMap keeps its keys in sorted order using a red-black tree — a binary search tree that rebalances itself on every insert so it never degrades into a slow lopsided shape. Every operation costs O(log n): a touch slower than HashMap's O(1), but you get ordered iteration plus firstKey(), lastKey(), ceilingKey(), and range views for free.
A HashSet is simply a HashMap where you only care about the keys — the values are a hidden dummy object. That's why a set's elements are unique and why add() and contains() are average O(1). (A TreeSet is the same trick over a TreeMap, giving sorted, O(log n) behaviour.)
import java.util.*;
public class Main {
public static void main(String[] args) {
// HashMap: super fast, but iteration order is unpredictable.
Map<String, Integer> hash = new HashMap<>();
hash.put("banana", 3);
hash.put("apple", 5);
hash.put("cherry", 1);
System.out.println("HashMap order: " + hash);
// TreeMap: a red-black tree (a self-balancing binary search tree).
// Keys are always kept in sorted order -> O(log n) per operation.
TreeMap<String, Integer> tree = new TreeMap<>(hash);
System.out.println("TreeMap order: " + tree);
// Sorted-only powers you get for that O(log n) cost:
System.out.println("first key: " + tree.firstKey());
System.out.println("last key: " + tree.lastKey());
System.out.println("ceiling('b'): " + tree.ceilingKey("b")); // >= "b"
System.out.println("headMap('c'): " + tree.headMap("cherry")); // keys < cherry
// HashSet = a HashMap where only the keys matter. Duplicates are dropped.
Set<String> seen = new HashSet<>();
String[] words = {"red", "blue", "red", "green", "blue"};
for (String w : words) seen.add(w); // add() is average O(1)
System.out.println("\nUnique words: " + seen);
System.out.println("contains('red'): " + seen.contains("red"));
}
}HashMap order: {banana=3, cherry=1, apple=5}
TreeMap order: {apple=5, banana=3, cherry=1}
first key: apple
last key: cherry
ceiling('b'): banana
headMap('c'): {apple=5, banana=3}
Unique words: [red, green, blue]
contains('red'): true🎯 Your Turn #1 — Make a Valid HashMap Key
Fill in the two blanks so a Point works as a HashMap key. Remember the contract: equal objects must return equal hash codes. The expected output is in the comments so you can self-check.
import java.util.*;
public class Main {
// 🎯 YOUR TURN — make Point usable as a HashMap key.
// A HashMap finds a key by: bucketFor(hashCode()) THEN equals().
// If you override one but not the other, lookups silently fail.
static class Point {
int x, y;
Point(int x, int y) { this.x = x; this.y = y; }
@Override
public boolean equals(Object o) {
if (!(o instanceof Point p)) return false;
// 👉 two points are equal when BOTH coordinates match
return ___; // replace ___ with: x == p.x && y == p.y
}
@Override
public int hashCode() {
// 👉 build ONE int from both fields (Objects.hash does this safely)
return ___; // replace ___ with: Objects.hash(x, y)
}
}
public static void main(String[] args) {
Map<Point, String> cities = new HashMap<>();
cities.put(new Point(0, 0), "Origin");
// A brand-new Point with the same coordinates must find the value:
String found = cities.get(new Point(0, 0));
System.out.println("Found: " + found);
// ✅ Expected output:
// Found: Origin
// (Without correct equals + hashCode this prints "Found: null")
}
}Found: Origin🎯 Your Turn #2 — Choose the Right Collection
Each comment describes a job. Replace each ___ with the collection whose internals best fit that job. The hint on each line tells you the answer — match it to what you learned above.
import java.util.*;
public class Main {
// 🎯 YOUR TURN — pick the right collection for each job, then fill the blank.
public static void main(String[] args) {
// 1) You need fast lookup by id, no ordering needed.
// 👉 best choice: HashMap
Map<Integer, String> users = new ___<>(); // replace ___ with HashMap
users.put(1, "Alice");
users.put(2, "Bob");
System.out.println("user 2: " + users.get(2));
// 2) You need keys ALWAYS sorted (e.g. a leaderboard by score).
// 👉 best choice: TreeMap
Map<Integer, String> ranked = new ___<>(); // replace ___ with TreeMap
ranked.put(30, "Bronze");
ranked.put(10, "Gold");
ranked.put(20, "Silver");
System.out.println("sorted: " + ranked);
// 3) You need to remember which ids you've already seen — uniqueness only.
// 👉 best choice: HashSet
Set<Integer> seen = new ___<>(); // replace ___ with HashSet
seen.add(1); seen.add(1); seen.add(2);
System.out.println("unique count: " + seen.size());
// ✅ Expected output:
// user 2: Bob
// sorted: {10=Gold, 20=Silver, 30=Bronze}
// unique count: 2
}
}user 2: Bob
sorted: {10=Gold, 20=Silver, 30=Bronze}
unique count: 2🧩 Mini-Challenge — Word Frequency Counter
Now with the scaffolding removed. You get only a comment outline and the expected output. Count each word, then print the counts in alphabetical order — the right Map gives you the sorting for free.
import java.util.*;
public class Main {
public static void main(String[] args) {
// 🎯 MINI-CHALLENGE: Word frequency counter
// Given the words below, count how many times each word appears,
// then print the counts with the words in alphabetical order.
//
// 1. Choose a Map that keeps keys SORTED (which one keeps order?).
// 2. For each word: getOrDefault(word, 0) + 1, then put it back.
// 3. Print the map — sorted order comes for free from the right Map.
//
// ✅ Expected output:
// {apple=3, banana=2, cherry=1}
String[] words = {
"banana", "apple", "cherry", "apple", "banana", "apple"
};
// your code here
}
}TreeMap<String, Integer> so keys stay sorted, then for each word docounts.put(w, counts.getOrDefault(w, 0) + 1). Printing the map gives {apple=3, banana=2, cherry=1}.Common Errors & How to Fix Them
equals() but not hashCode() (or returning a constant) makes map.get(key) return null for a key you definitely added. Fix: always override both together — use Objects.hash(...) over the same fields you compare in equals(), or let your IDE / a record generate them.hashCode() while the object is a key, and it now hashes to a different bucket — you can never look it up again. Fix: use immutable keys (String, Integer, records with final fields).list.remove(x) inside a for (T x : list) loop throws this at runtime. Fix: use list.removeIf(...), or iterate with an explicit Iterator and call it.remove().LinkedList and then calling get(i) in a loop turns an O(n) job into O(n²); using a plain List.contains() for membership checks is O(n) per call. Fix: ArrayList for indexed access, HashSet for membership, ArrayDeque for a queue.Pro Tips
💡 Pre-size when you know the count: new ArrayList<>(10000) or new HashMap<>(expected * 4 / 3) skips repeated resize-and-copy work.
💡 Default to ArrayList: contiguous memory is CPU-cache friendly, so it beats LinkedList for almost everything. Reach for LinkedList (or ArrayDeque) only for true queue/deque workloads.
💡 Records are perfect keys: a record auto-generates correct, immutable equals() and hashCode() — ideal as HashMap keys with zero boilerplate.
📋 Quick Reference — Big-O of Each Operation
| Collection | Backed By | Access / get | Search / contains | Insert | Delete |
|---|---|---|---|---|---|
| ArrayList | Dynamic array | O(1) | O(n) | O(1)* / O(n) mid | O(n) |
| LinkedList | Doubly-linked nodes | O(n) | O(n) | O(1) ends | O(1) ends |
| HashMap | Hash table + buckets | O(1) avg | O(1) avg | O(1) avg | O(1) avg |
| HashSet | HashMap (keys only) | — | O(1) avg | O(1) avg | O(1) avg |
| TreeMap | Red-black tree | O(log n) | O(log n) | O(log n) | O(log n) |
| TreeSet | Red-black tree | — | O(log n) | O(log n) | O(log n) |
* ArrayList add at the end is amortized O(1); "avg" means average case — a degenerate hashCode() can make hash collections O(n), or O(log n) once a bucket treeifies.
❓ Frequently Asked Questions
Why is ArrayList add() called O(1) if it sometimes copies the whole array?
That copy only happens when the backing array is full, and each resize roughly doubles capacity, so the expensive copies get rarer and rarer. Spread across all the adds, the average cost per add stays constant — this is called amortized O(1).
When should I actually use LinkedList instead of ArrayList?
Almost never as a general list. ArrayList wins for random access and iteration because its memory is contiguous and CPU-cache friendly. LinkedList only pays off when you frequently add or remove at the ends and use it as a Queue or Deque.
Why do I have to override both equals() and hashCode()?
HashMap first uses hashCode() to find the bucket, then equals() to find the exact key inside it. If two equal objects return different hash codes they land in different buckets, so the map never sees them as the same key and lookups return null.
What does the load factor 0.75 mean?
Load factor is size divided by capacity. When a HashMap passes 0.75 full it doubles its bucket array and rehashes everything. 0.75 is the default trade-off between wasted memory (low values) and more collisions slowing lookups (high values).
What is treeify and when does it happen?
Since Java 8, if a single bucket collects more than 8 entries (and the table is at least 64 buckets), that bucket converts its linked list into a red-black tree. Worst-case lookups in that bucket then drop from O(n) to O(log n), limiting the damage from poor hash codes.
What is a ConcurrentModificationException and how do I avoid it?
It is thrown when you structurally modify a collection (add or remove) while iterating it with a for-each loop. Use the iterator's own remove() method, collect changes and apply them after the loop, or call removeIf() instead.
🎉 Lesson Complete!
You can now explain how each core collection stores its data: ArrayList's resizing backing array, LinkedList's pointer-chained nodes, HashMap's buckets driven by hashCode() and equals(), TreeMap's self-balancing red-black tree, and HashSet riding on top of a HashMap. Best of all, you can read the right choice straight off the Big-O table.
Next up: Advanced HashMap, TreeMap & LinkedHashMap — ordering guarantees, navigable maps, and the tricks that make these data structures sing in real code.
Sign up for free to track which lessons you've completed and get learning reminders.