If you only learn three Unix tools beyond the basics, learn these. Together they handle almost every text-processing task you will hit at the shell.
grep — Finding Lines
grep "error" file.log
grep -i "error" file.log # case-insensitive
grep -v "DEBUG" file.log # invert match
grep -n "error" file.log # line numbers
grep -c "error" file.log # count matches
grep -l "error" *.log # files containing match
grep -L "error" *.log # files NOT containing match
grep -A 3 "error" file.log # 3 lines after each match
grep -B 2 "error" file.log # 2 lines before
grep -C 2 "error" file.log # 2 lines context (both)
grep -r "TODO" src/ # recursive
grep -E "error|warn" file.log # extended regex (alternation)
grep -F "1.2.3" file.log # fixed string (no regex)
grep -q "error" file.log && echo "found" # silent test
ripgrep (rg) is faster, smarter, and respects .gitignore. Install it where you can; the patterns above transfer directly.
Regular Expressions, Briefly
| Pattern | Meaning |
|---|---|
. | any character |
* | zero or more of previous |
+ | one or more (extended regex) |
? | zero or one |
^ / $ | line start / end |
[abc] / [^abc] | any of / none of |
[0-9] | character class range |
\d \w \s | digit/word/whitespace (PCRE) |
(a|b) | alternation (extended regex) |
sed — Stream Editing
The most common sed use is in-place substitution:
sed 's/old/new/' file.txt # first occurrence per line
sed 's/old/new/g' file.txt # all occurrences
sed 's|/usr/bin|/usr/local/bin|g' file.txt # alternate delimiter
sed -i 's/old/new/g' file.txt # in-place edit (Linux)
sed -i.bak 's/old/new/g' file.txt # in-place with backup
# Multiple expressions
sed -e 's/foo/bar/g' -e 's/baz/qux/g' file.txt
# Delete lines
sed '/^#/d' file.txt # delete comment lines
sed '/^$/d' file.txt # delete empty lines
sed '5,10d' file.txt # delete lines 5..10
# Print specific lines
sed -n '10,20p' file.txt # only lines 10..20
sed -n '/error/p' file.txt # only matching lines
For anything more complex than substitute, delete, or print — reach for awk or a real language.
macOS gotcha
BSD sed (macOS default) handles -i differently than GNU sed (Linux). Either install gnu-sed or use sed -i '' on macOS.
awk — Column-Oriented Computation
awk is a tiny language built around: split each line into fields, run a block for each line, optionally aggregate at the end.
awk '{print $1}' file.txt # print first column
awk '{print $NF}' file.txt # print last column
awk -F: '{print $1}' /etc/passwd # custom delimiter
awk '{print $1, $3}' file.txt # multiple fields
awk 'NR==1' file.txt # first line only
awk 'NR>1' file.txt # skip header
# Pattern + action
awk '/error/ {print $0}' app.log
awk '$3 > 100 {print}' data.tsv
awk -F, '$1=="alice" {print $3}' people.csv
# Arithmetic
awk '{sum += $2} END {print sum}' data.txt # sum a column
awk '{sum += $2; n++} END {print sum/n}' data.txt # average
# Group and count
awk '{count[$1]++} END {for (k in count) print count[k], k}' access.log \
| sort -rn
# Field number variables
# $1, $2, ... = fields
# $0 = whole line
# NR = current record (line) number
# NF = number of fields on this line
# FS = field separator (input)
# OFS = field separator (output)
A Comparison
| Task | Tool |
|---|---|
| Find lines matching pattern | grep |
| Replace text in a file | sed -i 's/.../.../g' |
| Extract column N | awk '{print $N}' or cut -fN |
| Sum / average a column | awk |
| Count occurrences per group | awk + sort | uniq -c |
| Anything multi-step or fragile | Python or jq |
Worked Example: Web Log Top Lists
# Top 10 IPs by request count
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head
# Top 10 URLs
awk '{print $7}' access.log | sort | uniq -c | sort -rn | head
# Average response size for 200 responses
awk '$9==200 {sum+=$10; n++} END {print sum/n}' access.log
# 404 URLs
awk '$9==404 {print $7}' access.log | sort | uniq -c | sort -rn
# Errors in the last hour
sed -n "/$(date -d '1 hour ago' +%H:%M)/,$p" /var/log/syslog | grep -i error
JSON: Use jq
For JSON, don't try awk. Use jq:
cat data.json | jq '.users[] | select(.active) | .email'
kubectl get pods -o json | jq -r '.items[].metadata.name'
curl -s api/users | jq '.[].id'
When to Stop Using awk
If your awk script has more than 10 lines, has functions, or processes JSON or multi-line records — switch to Python. Awk's strength is one-liners; past that point, the code becomes hard to read.
Cert Mapping
| Cert | Scope |
|---|---|
| RHCSA / LFCS | Routine log inspection and config edits |
| AWS SAA | CloudWatch log filters; quick parsing of API output |
The next lesson covers what makes a script production-quality: functions, error handling, and exit codes.