grep, sed, and awk — Bash and Shell Scripting | CertQnA

If you only learn three Unix tools beyond the basics, learn these. Together they handle almost every text-processing task you will hit at the shell.

grep — Finding Lines

grep "error" file.log
grep -i "error" file.log         # case-insensitive
grep -v "DEBUG" file.log         # invert match
grep -n "error" file.log         # line numbers
grep -c "error" file.log         # count matches
grep -l "error" *.log            # files containing match
grep -L "error" *.log            # files NOT containing match
grep -A 3 "error" file.log       # 3 lines after each match
grep -B 2 "error" file.log       # 2 lines before
grep -C 2 "error" file.log       # 2 lines context (both)
grep -r "TODO" src/              # recursive
grep -E "error|warn" file.log    # extended regex (alternation)
grep -F "1.2.3" file.log         # fixed string (no regex)
grep -q "error" file.log && echo "found"   # silent test

ripgrep (rg) is faster, smarter, and respects .gitignore. Install it where you can; the patterns above transfer directly.

Regular Expressions, Briefly

Pattern	Meaning
`.`	any character
`*`	zero or more of previous
`+`	one or more (extended regex)
`?`	zero or one
`^` / `$`	line start / end
`[abc]` / `[^abc]`	any of / none of
`[0-9]`	character class range
`\d` `\w` `\s`	digit/word/whitespace (PCRE)
`(a\|b)`	alternation (extended regex)

sed — Stream Editing

The most common sed use is in-place substitution:

sed 's/old/new/' file.txt          # first occurrence per line
sed 's/old/new/g' file.txt         # all occurrences
sed 's|/usr/bin|/usr/local/bin|g' file.txt   # alternate delimiter
sed -i 's/old/new/g' file.txt      # in-place edit (Linux)
sed -i.bak 's/old/new/g' file.txt  # in-place with backup

# Multiple expressions
sed -e 's/foo/bar/g' -e 's/baz/qux/g' file.txt

# Delete lines
sed '/^#/d' file.txt           # delete comment lines
sed '/^$/d' file.txt           # delete empty lines
sed '5,10d' file.txt           # delete lines 5..10

# Print specific lines
sed -n '10,20p' file.txt       # only lines 10..20
sed -n '/error/p' file.txt     # only matching lines

For anything more complex than substitute, delete, or print — reach for awk or a real language.

macOS gotcha

BSD sed (macOS default) handles -i differently than GNU sed (Linux). Either install gnu-sed or use sed -i '' on macOS.

awk — Column-Oriented Computation

awk is a tiny language built around: split each line into fields, run a block for each line, optionally aggregate at the end.

awk '{print $1}' file.txt              # print first column
awk '{print $NF}' file.txt             # print last column
awk -F: '{print $1}' /etc/passwd       # custom delimiter
awk '{print $1, $3}' file.txt          # multiple fields
awk 'NR==1' file.txt                   # first line only
awk 'NR>1' file.txt                    # skip header

# Pattern + action
awk '/error/ {print $0}' app.log
awk '$3 > 100 {print}' data.tsv
awk -F, '$1=="alice" {print $3}' people.csv

# Arithmetic
awk '{sum += $2} END {print sum}' data.txt        # sum a column
awk '{sum += $2; n++} END {print sum/n}' data.txt # average

# Group and count
awk '{count[$1]++} END {for (k in count) print count[k], k}' access.log \
  | sort -rn

# Field number variables
# $1, $2, ... = fields
# $0          = whole line
# NR          = current record (line) number
# NF          = number of fields on this line
# FS          = field separator (input)
# OFS         = field separator (output)

A Comparison

Task	Tool
Find lines matching pattern	grep
Replace text in a file	sed -i 's/.../.../g'
Extract column N	awk '{print $N}' or cut -fN
Sum / average a column	awk
Count occurrences per group	awk + sort \| uniq -c
Anything multi-step or fragile	Python or jq

Worked Example: Web Log Top Lists

# Top 10 IPs by request count
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head

# Top 10 URLs
awk '{print $7}' access.log | sort | uniq -c | sort -rn | head

# Average response size for 200 responses
awk '$9==200 {sum+=$10; n++} END {print sum/n}' access.log

# 404 URLs
awk '$9==404 {print $7}' access.log | sort | uniq -c | sort -rn

# Errors in the last hour
sed -n "/$(date -d '1 hour ago' +%H:%M)/,$p" /var/log/syslog | grep -i error

JSON: Use jq

For JSON, don't try awk. Use jq:

cat data.json | jq '.users[] | select(.active) | .email'
kubectl get pods -o json | jq -r '.items[].metadata.name'
curl -s api/users | jq '.[].id'

When to Stop Using awk

If your awk script has more than 10 lines, has functions, or processes JSON or multi-line records — switch to Python. Awk's strength is one-liners; past that point, the code becomes hard to read.

Cert Mapping

Cert	Scope
RHCSA / LFCS	Routine log inspection and config edits
AWS SAA	CloudWatch log filters; quick parsing of API output

The next lesson covers what makes a script production-quality: functions, error handling, and exit codes.