The Unix philosophy is: small tools, doing one thing well, connected by pipes. The shell makes that connection trivial. Mastering pipes and redirection turns the shell into a programmable data-processing environment.
The Three Streams
| FD | Name | Default |
|---|---|---|
| 0 | stdin | keyboard |
| 1 | stdout | terminal |
| 2 | stderr | terminal |
Programs write normal output to stdout and errors to stderr. They read input from stdin. Redirection rewires these streams.
Output Redirection
ls > files.txt # write stdout to file (overwrite)
ls >> files.txt # append
ls 2> errors.txt # write stderr to file
ls 2>> errors.txt # append stderr
ls > out.txt 2> err.txt # split streams
ls > both.txt 2>&1 # merge stderr into stdout, then to file
ls &> both.txt # Bash shorthand for the above
The order matters: > both.txt 2>&1 works; 2>&1 > both.txt does not (it dups stderr to the terminal first, then redirects stdout).
Discard Output
noisy_command > /dev/null # toss stdout
noisy_command 2> /dev/null # toss stderr only
noisy_command > /dev/null 2>&1 # toss both
Input Redirection
wc -l < file.txt
sort < input.txt > sorted.txt
# Heredoc
cat <<EOF
Multi-line
text
EOF
# Here-string
grep "error" <<< "$LOG_LINE"
Pipes
The pipe (|) wires stdout of one command to stdin of the next.
ps aux | grep nginx
cat access.log | sort | uniq -c | sort -rn | head -n 20
# Top 20 IP addresses by request count:
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -n 20
Each command runs as a separate process; data flows through. There is no temporary file.
Pipefail
By default, the exit code of a pipeline is the exit code of the last command. If cat file | grep error fails because cat couldn't open the file, you would still get the exit code from grep. set -o pipefail changes that:
set -o pipefail
cat /nonexistent | grep foo
echo "$?" # non-zero — first failure propagates
Always set pipefail in serious scripts. Combined with set -e, your script stops the moment something breaks.
tee
Sometimes you want a stream to go to a file and continue down the pipe.
long_command | tee log.txt | grep error
build.sh 2>&1 | tee build.log
sudo something | tee -a /var/log/something.log
Process Substitution
Sometimes you want a command's output to look like a file to another command:
diff <(sort file1) <(sort file2)
# Compare list of installed packages on two hosts:
diff <(ssh host1 'dpkg -l') <(ssh host2 'dpkg -l')
# Loop over output without subshell pitfall:
while read -r line; do
echo "> $line"
done < <(find . -type f)
<(...) creates a temporary FIFO and substitutes its path. >(...) works the other way for a writable target.
xargs
Many commands take arguments, not stdin. xargs bridges the gap by turning stdin into arguments.
find . -name "*.bak" | xargs rm
echo "a b c" | xargs -n 1 echo
find . -name "*.log" -print0 | xargs -0 gzip
-print0 with xargs -0 is the safe form: separates filenames by NUL, so spaces and newlines in names don't break things.
Many xargs uses can be replaced by find -exec:
find . -name "*.bak" -exec rm {} +
Useful Filter Commands
| Command | Job |
|---|---|
sort | sort lines; -n numeric, -r reverse, -u unique |
uniq | collapse adjacent duplicates; -c counts |
cut | extract columns: cut -d: -f1 /etc/passwd |
tr | character translate: tr A-Z a-z |
tac | reverse line order |
head / tail | first / last N lines |
wc | count lines, words, bytes |
tee | write stream to file and stdout |
jq | JSON processor — install it |
A Composed Example
# Find the 5 slowest endpoints in a web log
awk '{print $7, $NF}' access.log \
| sort \
| awk '{sum[$1]+=$2; count[$1]++} END {for (k in sum) print sum[k]/count[k], k}' \
| sort -rn \
| head -n 5
That is a one-liner that would take a hundred lines in many languages. The shell's superpower.
Cert Mapping
| Cert | Scope |
|---|---|
| RHCSA / LFCS | Heavy use of pipes and redirection on tasks |
| AWS SAA | Log filtering on EC2, parsing kubectl output |
The next lesson goes deep on the three power tools that handle most text processing: grep, sed, and awk.