Skip to content
5 min read·Lesson 6 of 10

grep, sed, and awk

The three indispensable text-processing tools — what each is best at, and the patterns you will use most often.

If you only learn three Unix tools beyond the basics, learn these. Together they handle almost every text-processing task you will hit at the shell.

grep — Finding Lines

grep "error" file.log
grep -i "error" file.log         # case-insensitive
grep -v "DEBUG" file.log         # invert match
grep -n "error" file.log         # line numbers
grep -c "error" file.log         # count matches
grep -l "error" *.log            # files containing match
grep -L "error" *.log            # files NOT containing match
grep -A 3 "error" file.log       # 3 lines after each match
grep -B 2 "error" file.log       # 2 lines before
grep -C 2 "error" file.log       # 2 lines context (both)
grep -r "TODO" src/              # recursive
grep -E "error|warn" file.log    # extended regex (alternation)
grep -F "1.2.3" file.log         # fixed string (no regex)
grep -q "error" file.log && echo "found"   # silent test

ripgrep (rg) is faster, smarter, and respects .gitignore. Install it where you can; the patterns above transfer directly.

Regular Expressions, Briefly

PatternMeaning
.any character
*zero or more of previous
+one or more (extended regex)
?zero or one
^ / $line start / end
[abc] / [^abc]any of / none of
[0-9]character class range
\d \w \sdigit/word/whitespace (PCRE)
(a|b)alternation (extended regex)

sed — Stream Editing

The most common sed use is in-place substitution:

sed 's/old/new/' file.txt          # first occurrence per line
sed 's/old/new/g' file.txt         # all occurrences
sed 's|/usr/bin|/usr/local/bin|g' file.txt   # alternate delimiter
sed -i 's/old/new/g' file.txt      # in-place edit (Linux)
sed -i.bak 's/old/new/g' file.txt  # in-place with backup

# Multiple expressions
sed -e 's/foo/bar/g' -e 's/baz/qux/g' file.txt

# Delete lines
sed '/^#/d' file.txt           # delete comment lines
sed '/^$/d' file.txt           # delete empty lines
sed '5,10d' file.txt           # delete lines 5..10

# Print specific lines
sed -n '10,20p' file.txt       # only lines 10..20
sed -n '/error/p' file.txt     # only matching lines

For anything more complex than substitute, delete, or print — reach for awk or a real language.

macOS gotcha

BSD sed (macOS default) handles -i differently than GNU sed (Linux). Either install gnu-sed or use sed -i '' on macOS.

awk — Column-Oriented Computation

awk is a tiny language built around: split each line into fields, run a block for each line, optionally aggregate at the end.

awk '{print $1}' file.txt              # print first column
awk '{print $NF}' file.txt             # print last column
awk -F: '{print $1}' /etc/passwd       # custom delimiter
awk '{print $1, $3}' file.txt          # multiple fields
awk 'NR==1' file.txt                   # first line only
awk 'NR>1' file.txt                    # skip header

# Pattern + action
awk '/error/ {print $0}' app.log
awk '$3 > 100 {print}' data.tsv
awk -F, '$1=="alice" {print $3}' people.csv

# Arithmetic
awk '{sum += $2} END {print sum}' data.txt        # sum a column
awk '{sum += $2; n++} END {print sum/n}' data.txt # average

# Group and count
awk '{count[$1]++} END {for (k in count) print count[k], k}' access.log \
  | sort -rn

# Field number variables
# $1, $2, ... = fields
# $0          = whole line
# NR          = current record (line) number
# NF          = number of fields on this line
# FS          = field separator (input)
# OFS         = field separator (output)

A Comparison

TaskTool
Find lines matching patterngrep
Replace text in a filesed -i 's/.../.../g'
Extract column Nawk '{print $N}' or cut -fN
Sum / average a columnawk
Count occurrences per groupawk + sort | uniq -c
Anything multi-step or fragilePython or jq

Worked Example: Web Log Top Lists

# Top 10 IPs by request count
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head

# Top 10 URLs
awk '{print $7}' access.log | sort | uniq -c | sort -rn | head

# Average response size for 200 responses
awk '$9==200 {sum+=$10; n++} END {print sum/n}' access.log

# 404 URLs
awk '$9==404 {print $7}' access.log | sort | uniq -c | sort -rn

# Errors in the last hour
sed -n "/$(date -d '1 hour ago' +%H:%M)/,$p" /var/log/syslog | grep -i error

JSON: Use jq

For JSON, don't try awk. Use jq:

cat data.json | jq '.users[] | select(.active) | .email'
kubectl get pods -o json | jq -r '.items[].metadata.name'
curl -s api/users | jq '.[].id'

When to Stop Using awk

If your awk script has more than 10 lines, has functions, or processes JSON or multi-line records — switch to Python. Awk's strength is one-liners; past that point, the code becomes hard to read.

Cert Mapping

CertScope
RHCSA / LFCSRoutine log inspection and config edits
AWS SAACloudWatch log filters; quick parsing of API output

The next lesson covers what makes a script production-quality: functions, error handling, and exit codes.

Key Takeaways

  • grep finds lines, sed transforms lines, awk computes per-line and aggregates.
  • Use grep -E or ripgrep for regex; -F for fixed strings.
  • sed is mostly s/old/new/g and a few utility moves.
  • awk is a small language — perfect for column-oriented data.
  • When awk gets long, reach for Python instead.

Test your knowledge

Try exam-style practice questions to reinforce what you've learned.

Practice Questions →