Most shell scripts in production are fragile because they were never told to fail loudly. A small set of disciplines turns them into reliable automation.
The Standard Header
#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'
| Setting | Effect |
|---|---|
set -e | Exit on any command failure |
set -u | Error on unset variables |
set -o pipefail | Pipeline fails if any command fails |
IFS=$'\n\t' | Word-split only on newline/tab; safer for filenames with spaces |
This combination is sometimes called "Bash strict mode". It is non-negotiable for production scripts.
Caveats of set -e
- Doesn't trigger inside
ifconditions — that's the point. - Doesn't trigger for the left side of
&&,||, or for commands prefixed with!. - Use
cmd || truewhen you intentionally want to ignore a failure.
Functions
greet() {
echo "Hello, $1"
}
greet "alice"
greet "bob"
Define with name + parens; no need to declare arguments. Inside the function, $1, $2, $@, $# refer to function arguments, not the script's.
Returning Values
Functions return an exit code, not a value. To return a string or number, echo to stdout and capture with $().
add() {
echo $(( $1 + $2 ))
}
result=$(add 3 4)
echo "$result" # 7
# Returning success/failure
file_has_errors() {
grep -q error "$1"
}
if file_has_errors app.log; then
echo "errors found"
fi
Local Variables
Without local, variables in functions are global. Always declare locals:
my_func() {
local name="$1"
local count=0
for i in 1 2 3; do
count=$((count + 1))
done
echo "$count"
}
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Generic error |
| 2 | Misuse / bad arguments |
| 64 | EX_USAGE — bad invocation |
| 65 | EX_DATAERR — bad input data |
| 66 | EX_NOINPUT — input file missing |
| 69 | EX_UNAVAILABLE — service unavailable |
| 77 | EX_NOPERM — permission denied |
| 126 | Command not executable |
| 127 | Command not found |
| 128+N | Killed by signal N (e.g. 130 = Ctrl-C) |
Pick exit codes that downstream automation can act on. CI systems differentiate retryable from fatal failures based on these.
trap — Cleanup on Exit
tmp=$(mktemp)
trap 'rm -f "$tmp"' EXIT
# whatever you do here, the temp file is removed on exit
do_something_with "$tmp"
trap registers code to run on a signal. EXIT is special — it fires whether you exit normally, on error, or on Ctrl-C. Use it for cleanup.
# Cleanup with multiple resources
cleanup() {
rm -f "$tmp1" "$tmp2"
kill "$bg_pid" 2>/dev/null || true
}
trap cleanup EXIT INT TERM
Error Helpers
err() { echo "ERROR: $*" >&2; }
die() { err "$*"; exit 1; }
[[ -f /etc/hosts ]] || die "missing /etc/hosts"
command -v jq >/dev/null || die "jq is required"
Send errors to stderr (>&2) so they don't pollute stdout that the caller might be parsing.
Validating Inputs
Fail at the boundary, with clear messages:
main() {
local target="${1:-}"
[[ -n "$target" ]] || die "Usage: $0 <target-dir>"
[[ -d "$target" ]] || die "Not a directory: $target"
[[ -w "$target" ]] || die "Not writable: $target"
# ...
}
main "$@"
Logging
log() { printf '[%s] %s\n' "$(date -u +%FT%TZ)" "$*"; }
warn() { printf '[%s] WARN: %s\n' "$(date -u +%FT%TZ)" "$*" >&2; }
err() { printf '[%s] ERROR: %s\n' "$(date -u +%FT%TZ)" "$*" >&2; }
log "Starting backup"
warn "Skipping unreadable file"
err "Backup failed"
Putting It Together
#!/usr/bin/env bash
set -euo pipefail
readonly LOG="/var/log/myscript.log"
log() { printf '[%s] %s\n' "$(date -u +%FT%TZ)" "$*" | tee -a "$LOG"; }
die() { log "FATAL: $*"; exit 1; }
cleanup() {
log "cleaning up"
rm -f "$tmp"
}
main() {
local src="${1:?usage: $0 SRC DEST}"
local dest="${2:?usage: $0 SRC DEST}"
[[ -d "$src" ]] || die "src not a directory: $src"
[[ -d "$dest" ]] || die "dest not a directory: $dest"
tmp=$(mktemp)
trap cleanup EXIT
log "syncing $src -> $dest"
rsync -a "$src/" "$dest/" >> "$tmp" || die "rsync failed"
log "done; bytes: $(wc -c < "$tmp")"
}
main "$@"
That template handles 90% of production shell scripts: strict mode, clear logging, input validation, cleanup on exit, real error messages.
Cert Mapping
| Cert | Scope |
|---|---|
| RHCSA / LFCS | Writing reliable admin scripts |
| AWS / Azure | Bootstrap / user data scripts that must not silently fail |
The next lesson covers parsing CLI arguments and flags properly.