Lesson 8 • Intermediate
Text Processing 🔍
By the end of this lesson you'll be able to search, slice, summarise, and transform text from the command line — and chain those tools into a real log-analysis pipeline that answers questions in seconds, not minutes.
What You'll Learn
- Search files with grep and its key flags (-i, -r, -n, -v, -E)
- Locate files by name and age with find (and combine it with grep)
- Slice columns with cut and count things with sort | uniq -c
- Count lines/words with wc and peek with head, tail, and tail -f
- Find-and-replace with sed — and avoid the -i in-place trap
- Read columns with a gentle awk intro ($1, $2 fields)
| and > from the previous lesson) — this whole lesson is about piping these tools together. Every command below is real shell. Run them in your own terminal (macOS, Linux, or WSL/Git Bash on Windows), or paste them into the free bash runner linked under each block. The Output panels show exactly what you should see.grep is the sieve that keeps only the lines you want; cut is the knife that trims each line to one column; sort and uniq -c are the scales that group and weigh; sed is the seasoning that swaps one ingredient for another. The pipe | is the conveyor belt: each tool does one small job and passes its output to the next. Master the line and you can answer almost any "what's in this file?" question.Setup: a log file to practise on
Every example in this lesson works on the same web-server log. Run this block first to create access.log in your current folder. Each line has the same shape: date time LEVEL METHOD url status user — handy because each piece sits in its own column.
# Create a realistic web-server log to practise on.
# (A heredoc just writes these lines verbatim into access.log.)
cat > access.log <<'EOF'
2026-06-15 09:01:12 INFO GET /home 200 alice
2026-06-15 09:01:30 INFO GET /products 200 bob
2026-06-15 09:02:05 WARN GET /search 200 alice
2026-06-15 09:02:41 ERROR POST /login 500 bob
2026-06-15 09:03:10 INFO GET /products 200 carol
2026-06-15 09:03:55 ERROR GET /cart 404 alice
2026-06-15 09:04:22 INFO POST /login 200 carol
2026-06-15 09:05:01 ERROR POST /checkout 500 bob
2026-06-15 09:05:48 WARN GET /products 200 alice
EOF
# How many lines did we write?
wc -l access.log9 access.log1️⃣ grep — search for patterns
grep ("Global Regular Expression Print") scans a file line by line and prints every line that matches your pattern. A handful of flags do most of the work: -i ignores case, -n adds line numbers, -v inverts the match (lines that don't match), -r searches a whole folder, and -E turns on extended regex so | means "or".
# grep prints every line that MATCHES a pattern.
# 1) Plain match: every line containing ERROR
grep "ERROR" access.log
echo "--- -i: case-insensitive ---"
# -i ignores case, so 'warn', 'WARN', 'Warn' all match
grep -i "warn" access.log
echo "--- -n: show line numbers ---"
grep -n "ERROR" access.log
echo "--- -v: INVERT — lines that do NOT match ---"
# Everything that is NOT an INFO line
grep -v "INFO" access.log
echo "--- -E: extended regex (ERROR or WARN) ---"
# -E turns on extended regex so | means OR. Without -E you'd
# have to write the clunky \| instead.
grep -E "ERROR|WARN" access.log
echo "--- -r: search a whole folder recursively ---"
# -r walks every file under logs/ ; -n adds file:line numbers
grep -rn "ERROR" logs/2026-06-15 09:02:41 ERROR POST /login 500 bob
2026-06-15 09:03:55 ERROR GET /cart 404 alice
2026-06-15 09:05:01 ERROR POST /checkout 500 bob
--- -i: case-insensitive ---
2026-06-15 09:02:05 WARN GET /search 200 alice
2026-06-15 09:05:48 WARN GET /products 200 alice
--- -n: show line numbers ---
4:2026-06-15 09:02:41 ERROR POST /login 500 bob
6:2026-06-15 09:03:55 ERROR GET /cart 404 alice
8:2026-06-15 09:05:01 ERROR POST /checkout 500 bob
--- -v: INVERT — lines that do NOT match ---
2026-06-15 09:02:05 WARN GET /search 200 alice
2026-06-15 09:02:41 ERROR POST /login 500 bob
2026-06-15 09:03:55 ERROR GET /cart 404 alice
2026-06-15 09:05:01 ERROR POST /checkout 500 bob
2026-06-15 09:05:48 WARN GET /products 200 alice
--- -E: extended regex (ERROR or WARN) ---
2026-06-15 09:02:05 WARN GET /search 200 alice
2026-06-15 09:02:41 ERROR POST /login 500 bob
2026-06-15 09:03:55 ERROR GET /cart 404 alice
2026-06-15 09:05:01 ERROR POST /checkout 500 bob
2026-06-15 09:05:48 WARN GET /products 200 alice
--- -r: search a whole folder recursively ---
logs/access.log:2026-06-15 09:02:41 ERROR POST /login 500 bob
logs/access.log:2026-06-15 09:03:55 ERROR GET /cart 404 alice
logs/access.log:2026-06-15 09:05:01 ERROR POST /checkout 500 bob🎯 Your Turn: find the 500 errors
Fill in the blank with the flag that prints line numbers, then run it and check the output matches.
# 🎯 YOUR TURN — find every line that is a 500 error.
# Goal: show the line numbers of lines containing "500".
# 1) Add the flag that prints line numbers
grep ___ "500" access.log # 👉 replace ___ with the line-number flag
# ✅ Expected output:
# 4:2026-06-15 09:02:41 ERROR POST /login 500 bob
# 8:2026-06-15 09:05:01 ERROR POST /checkout 500 bob4:2026-06-15 09:02:41 ERROR POST /login 500 bob
8:2026-06-15 09:05:01 ERROR POST /checkout 500 bob2️⃣ find — locate files
Where grep searches inside files, find searches for files by name, type, size, or age. Give it a starting folder (. means "here") and conditions like -name "*.log" or -mtime -1 (changed in the last day). With -exec you can even run another command — like grep — on every file it finds.
# find walks a directory tree looking for files by NAME, type,
# size, or age — it finds files, grep finds text inside them.
# Every .log file anywhere under the current folder (. means "here")
find . -name "*.log"
echo "--- only files (not folders) modified in the last day ---"
# -type f = regular files only; -mtime -1 = changed < 1 day ago
find . -type f -mtime -1
echo "--- combine find + grep: search inside everything find returns ---"
# -exec runs grep on each match; {} is the filename, \; ends -exec
find . -name "*.log" -exec grep -l "ERROR" {} \;./access.log
./logs/access.log
--- only files (not folders) modified in the last day ---
./access.log
./logs/access.log
--- combine find + grep: search inside everything find returns ---
./access.log
./logs/access.loglogs/ subfolder also exists. Online runners are sandboxed, so file listings will differ.3️⃣ cut, sort, uniq -c & wc — slice and summarise
Most logs are columns. cut -d' ' -f4 splits on a space and keeps field 4. The classic counting trick is sort | uniq -c: uniq only collapses adjacent duplicates, so you must sort first to group identical lines together; -c then prefixes each with its count. Add another sort -rn to rank biggest-first, and wc -l to count lines.
# These tools slice text into columns and summarise it.
# cut -d' ' splits on a space; -f4 keeps field 4 (the log LEVEL).
echo "--- cut: pull out the log level (field 4) ---"
cut -d' ' -f4 access.log
echo "--- sort + uniq -c: COUNT how many of each level ---"
# uniq only collapses ADJACENT duplicates, so you MUST sort first.
# -c prefixes each line with its count.
cut -d' ' -f4 access.log | sort | uniq -c
echo "--- sort -rn: sort those counts, biggest first ---"
# -n = numeric sort, -r = reverse (descending)
cut -d' ' -f4 access.log | sort | uniq -c | sort -rn
echo "--- wc: count lines, then count just the errors ---"
wc -l access.log
grep -c "ERROR" access.log--- cut: pull out the log level (field 4) ---
INFO
INFO
WARN
ERROR
INFO
ERROR
INFO
ERROR
WARN
--- sort + uniq -c: COUNT how many of each level ---
3 ERROR
4 INFO
2 WARN
--- sort -rn: sort those counts, biggest first ---
4 INFO
3 ERROR
2 WARN
--- wc: count lines, then count just the errors ---
9 access.log
34️⃣ head & tail — peek at the ends
head -3 shows the first 3 lines; tail -2 shows the last 2. They're perfect on the end of a pipeline to keep just the "top N" results. The killer feature is tail -f ("follow"): it stays open and prints new lines as they're written, so you can watch a live server log in real time (press Ctrl+C to stop).
# head and tail peek at the START or END of a file.
echo "--- head -3: first 3 lines ---"
head -3 access.log
echo "--- tail -2: last 2 lines ---"
tail -2 access.log
echo "--- top 3 busiest pages (a real one-liner) ---"
# field 6 is the URL; count, sort descending, keep the top 3
cut -d' ' -f6 access.log | sort | uniq -c | sort -rn | head -3
# tail -f FOLLOWS a file live — new lines appear as they're written.
# Perfect for watching a server log in real time. Ctrl+C to stop.
# tail -f access.log--- head -3: first 3 lines ---
2026-06-15 09:01:12 INFO GET /home 200 alice
2026-06-15 09:01:30 INFO GET /products 200 bob
2026-06-15 09:02:05 WARN GET /search 200 alice
--- tail -2: last 2 lines ---
2026-06-15 09:05:01 ERROR POST /checkout 500 bob
2026-06-15 09:05:48 WARN GET /products 200 alice
--- top 3 busiest pages (a real one-liner) ---
3 /products
1 /search
1 /login5️⃣ sed — find and replace
sed ("stream editor") transforms text as it flows past. Its bread-and-butter is substitution: sed 's/old/new/' replaces the first match on each line, and adding /g replaces every match. By default it prints the result and leaves your file untouched — -i rewrites the file in place with no undo, so always preview first.
# sed is a stream editor. Its #1 job is find-and-replace:
# sed 's/old/new/' replaces the FIRST match on each line
# sed 's/old/new/g' replaces EVERY match (g = global)
echo "--- swap GET for FETCH (first match per line) ---"
sed 's/GET/FETCH/' access.log | head -3
echo "--- delete every INFO line, keep the rest ---"
# /pattern/d deletes lines matching the pattern
sed '/INFO/d' access.log
# ⚠️ sed prints to the screen and leaves the file untouched.
# Add -i to rewrite the file IN PLACE — there is no undo:
# sed -i 's/ERROR/CRITICAL/' access.log--- swap GET for FETCH (first match per line) ---
2026-06-15 09:01:12 INFO FETCH /home 200 alice
2026-06-15 09:01:30 INFO FETCH /products 200 bob
2026-06-15 09:02:05 WARN FETCH /search 200 alice
--- delete every INFO line, keep the rest ---
2026-06-15 09:02:05 WARN GET /search 200 alice
2026-06-15 09:02:41 ERROR POST /login 500 bob
2026-06-15 09:03:55 ERROR GET /cart 404 alice
2026-06-15 09:05:01 ERROR POST /checkout 500 bob
2026-06-15 09:05:48 WARN GET /products 200 alice6️⃣ awk — a gentle intro to fields
awk is a tiny language for columnar text. It reads each line and splits it into fields you reference by number: $1 is the first field, $2 the second, and $0 the whole line. awk '{print $1, $2}' prints the first two columns; put a /pattern/ before the braces to act only on matching lines. That's already enough to do real work.
# awk reads each line as fields split on whitespace:
# $1 = first field, $2 = second, ... $0 = the whole line.
echo "--- print the time ($2) and the user (last field) ---"
awk '{print $2, $7}' access.log | head -4
echo "--- only ERROR lines, show URL ($6) and status ($7-ish) ---"
# A /pattern/ before {...} filters to matching lines first
awk '/ERROR/ {print $5, $6}' access.log
echo "--- count requests per user with an awk tally ---"
# Build up a count keyed by user ($7), print the totals at the END
awk '{count[$7]++} END {for (u in count) print count[u], u}' access.log | sort -rn--- print the time ($2) and the user (last field) ---
09:01:12 alice
09:01:30 bob
09:02:05 alice
09:02:41 bob
--- only ERROR lines, show URL ($6) and status ($7-ish) ---
POST /login
GET /cart
POST /checkout
--- count requests per user with an awk tally ---
4 alice
3 bob
2 carol🎯 Your Turn: requests per user
This pipeline is almost done. Fill in the one missing step so the counts come out right (remember the rule about uniq).
# 🎯 YOUR TURN — build a mini log-analysis pipeline.
# Goal: count how many requests EACH user made, busiest first.
# Pull out the user (field 7) -> sort -> count uniques -> sort by count.
cut -d' ' -f7 access.log | ___ | uniq -c | sort -rn
# 👆 replace ___ : uniq -c only collapses
# ADJACENT duplicates, so what must run first?
# ✅ Expected output:
# 4 alice
# 3 bob
# 2 carol 4 alice
3 bob
2 carolPutting It Together: a log-analysis report
Here's a single script that answers the questions an on-call engineer actually asks — total traffic, a breakdown by level, every error with its line number, the busiest page, and who hit the most errors. Every line uses a tool from this lesson. Read it top to bottom; you understand all of it now.
# === A real log-analysis report, built from this lesson's tools ===
echo "TOTAL REQUESTS:"
wc -l < access.log # < feeds the file to wc (no filename printed)
echo
echo "REQUESTS BY LEVEL:"
cut -d' ' -f4 access.log | sort | uniq -c | sort -rn
echo
echo "ALL ERRORS (with line numbers):"
grep -nE "ERROR" access.log
echo
echo "TOP PAGE:"
cut -d' ' -f6 access.log | sort | uniq -c | sort -rn | head -1
echo
echo "ERRORS PER USER:"
awk '/ERROR/ {print $7}' access.log | sort | uniq -c | sort -rnTOTAL REQUESTS:
9
REQUESTS BY LEVEL:
4 INFO
3 ERROR
2 WARN
ALL ERRORS (with line numbers):
4:2026-06-15 09:02:41 ERROR POST /login 500 bob
6:2026-06-15 09:03:55 ERROR GET /cart 404 alice
8:2026-06-15 09:05:01 ERROR POST /checkout 500 bob
TOP PAGE:
3 /products
ERRORS PER USER:
2 bob
1 alicePro Tips
- 💡 The counting idiom is worth memorising:
cut -d' ' -f<N> file | sort | uniq -c | sort -rnanswers "what are the most common values in column N?" for any file. - 💡 Build pipelines one stage at a time. Add a
|step, check the output, then add the next. It's far easier than debugging a 5-stage pipe all at once. - 💡 Preview before you destroy. Run
sedandfind -exec rmcommands without the destructive part first to confirm they target the right lines/files. - 💡
grep -ccounts matching lines;wc -lcounts all lines. Use the right one for the question you're asking.
Common Errors (and the fix)
- Regex
|matches nothing: plaingrep "ERROR|WARN"looks for the literal textERROR|WARN. Add-Efor extended regex so|means "or":grep -E "ERROR|WARN". - Counts look wrong / duplicates not merged: you ran
uniqwithout sorting.uniqonly collapses adjacent duplicates — alwayssortfirst:sort file | uniq -c. sed -iwiped your file and there's no undo: run it without-ifirst to preview. On macOS,-ialso needs an argument — usesed -i '' 's/a/b/' file(an empty string) or you'll get "command c expects \ followed by text".grep: folder: Is a directory: you searched a folder without-r. Add it:grep -r "text" folder/.- Sorting numbers gives 1, 10, 2, 3…: that's alphabetical order. Use
sort -nfor numeric order (and-rnfor biggest-first).
📋 Quick Reference
| Command | What it does |
|---|---|
| grep -in "x" f | Find "x" (case-insensitive) with line numbers |
| grep -rE "a|b" dir/ | Recursively find a OR b (extended regex) |
| grep -v "x" f | Lines that do NOT contain "x" |
| find . -name "*.log" | All .log files under the current folder |
| cut -d' ' -f4 f | Keep field 4 (space-separated) |
| sort | uniq -c | Count each unique line (sort first!) |
| sort -rn | Numeric sort, biggest first |
| wc -l f | Count lines in a file |
| head -3 / tail -2 | First 3 / last 2 lines |
| tail -f f | Follow a file live (new lines stream in) |
| sed 's/old/new/g' f | Replace every "old" with "new" |
| awk '{print $1, $2}' f | Print fields 1 and 2 of each line |
Frequently Asked Questions
Q: What's the difference between grep and find?
find locates FILES by name, type, size, or age (e.g. every *.log file in a folder). grep searches the TEXT inside files for a pattern. They pair up perfectly: use find to choose which files, then grep to search within them with find ... -exec grep ... {} \;.
Q: Why do I need to sort before uniq?
uniq only collapses duplicate lines that are next to each other. If identical lines are scattered through the file it won't see them as duplicates. Running sort first groups all identical lines together, so the pipeline cut ... | sort | uniq -c gives correct counts.
Q: When do I need grep -E?
Use -E (extended regular expressions) when you want features like alternation a|b, grouping (ab)+, or + and ? quantifiers without backslashes. Plain grep uses basic regex where you'd have to escape them as \|, \+ and so on. -E keeps patterns readable: grep -E "ERROR|WARN".
Q: Is sed -i safe to run?
Be careful: -i edits the file in place and there is no undo. Always run the command WITHOUT -i first to preview the result on screen, and ideally keep the file in version control. On macOS, -i even requires an argument (use sed -i '' 's/.../.../' file) — another reason to test first.
Q: Should I learn awk if I know grep, cut, and sed?
For simple jobs grep/cut/sed are quicker. Reach for awk when you need columns AND logic together — filtering on a field's value, doing arithmetic, or tallying with arrays (count[$7]++). It's a tiny programming language built for tabular text, and a little goes a long way.
Q: Do these tools work on Windows?
They're native to macOS and Linux. On Windows, use WSL (Windows Subsystem for Linux) or Git Bash, both of which ship these exact commands. PowerShell has its own equivalents (Select-String for grep, Where-Object for filtering), but the Unix tools above are the universal standard.
Mini-Challenge: profile a log file
No blanks this time — just a brief and an outline. Build the pipeline yourself on access.log (or any log you have), run it, and check it against the expected snippet in the comments. This is exactly the kind of throwaway analysis engineers run every day.
# 🎯 MINI-CHALLENGE: profile your own log file.
# Using ONLY the tools from this lesson (grep, cut, sort,
# uniq -c, wc, head, awk), produce a short report that shows:
#
# 1. The total number of requests (hint: wc -l)
# 2. A count of each HTTP method (GET / POST is field 5)
# 3. The single busiest user (count field 7, take head -1)
# 4. Every WARN or ERROR line, numbered (hint: grep -nE "WARN|ERROR")
#
# ✅ Example of what #3 should print:
# 4 alice
# your pipeline here🎉 Lesson Complete — and that's the Command Line course!
- ✅
grepsearches text —-i(case),-n(numbers),-v(invert),-r(recursive),-E(extended regex) - ✅
findlocates files by name/age and can-execanother command on each - ✅
cutslices columns;sort | uniq -c | sort -rncounts and ranks them - ✅
wc -lcounts lines;head/tailpeek at the ends;tail -ffollows live logs - ✅
sed 's/old/new/g'find-and-replaces (mind the-iin-place trap) - ✅
awkreads fields$1,$2… for columnar logic and tallies
This was the final lesson in the Command Line course — you can now navigate the filesystem, manage files, control permissions, write small scripts, chain commands with pipes, and slice through text like a pro. Where next? Take these skills into the Git course to version-control your projects, or revisit the CLI course overview to fill any gaps. The terminal is now yours. 🚀
Sign up for free to track which lessons you've completed and get learning reminders.