Errors, Logging, and Testing — Python for DevOps and Cloud | CertQnA

The difference between a script you can deploy and one you can't is usually error handling and tests. Python gives you good tools for both — this lesson covers the patterns you'll use every week.

Exceptions

try:
    data = json.loads(content)
except json.JSONDecodeError as e:
    log.error("invalid JSON in %s: %s", path, e)
    raise   # re-raise to caller
except Exception:
    log.exception("unexpected error parsing %s", path)
    raise

Rules:

Catch the most specific exception you can handle. except Exception: for top-level safety nets only.
Never except: bare — it catches KeyboardInterrupt and SystemExit too.
log.exception() includes the full traceback automatically — use it inside except blocks.
Re-raise (raise) when you can't actually recover — let the caller decide.

Custom Exception Classes

Domain-specific exceptions make code clearer than passing magic strings around:

class DeploymentError(Exception):
    """Base class for deployment failures."""

class ConfigurationError(DeploymentError):
    """Bad or missing configuration."""

class HealthCheckFailed(DeploymentError):
    """New version did not pass health checks."""

def deploy(env: str) -> None:
    if not config_valid(env):
        raise ConfigurationError(f"missing required keys for {env}")
    if not pass_healthcheck(env):
        raise HealthCheckFailed(f"healthcheck timed out for {env}")

Callers can then catch DeploymentError for any failure, or specific subclasses for specific recovery.

Context Managers

with isn't just for files. Anything that needs setup-and-teardown can be a context manager — locks, database connections, temp directories, timing blocks:

from contextlib import contextmanager
import time

@contextmanager
def timed(label: str):
    start = time.monotonic()
    try:
        yield
    finally:
        log.info("%s took %.2fs", label, time.monotonic() - start)

with timed("fetch all repos"):
    repos = fetch_all_repos()

The logging Module

import logging

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s %(levelname)-7s %(name)s: %(message)s",
)

log = logging.getLogger(__name__)

log.debug("verbose detail")
log.info("normal operation")
log.warning("something unexpected, recoverable")
log.error("something failed")
log.critical("system is on fire")

Levels in order of severity: DEBUG < INFO < WARNING < ERROR < CRITICAL. Set the threshold once at startup; everything below it is filtered out.

Lazy formatting

# Right — message is only formatted if the log line is actually emitted
log.info("processed %d items in %s", count, region)

# Less efficient — formats even when DEBUG is disabled
log.info(f"processed {count} items in {region}")

For low-traffic scripts the difference is negligible; for hot loops it matters.

Don't log secrets

Be paranoid about what ends up in logs — request bodies, environment dumps, tracebacks containing args. Filter or redact at the logger level.

Testing with pytest

pip install pytest

# app.py
def slugify(text: str) -> str:
    return text.lower().replace(" ", "-").strip("-")

# test_app.py
import pytest
from app import slugify

def test_basic():
    assert slugify("Hello World") == "hello-world"

def test_strips_dashes():
    assert slugify("--Hello--") == "hello"

@pytest.mark.parametrize("text,expected", [
    ("foo", "foo"),
    ("FOO", "foo"),
    ("foo bar baz", "foo-bar-baz"),
    ("", ""),
])
def test_cases(text, expected):
    assert slugify(text) == expected

pytest -v

Pytest auto-discovers files starting with test_ and functions starting with test_. parametrize generates one test per row — beautifully concise.

Fixtures

Fixtures are pytest's answer to setup/teardown. They're functions that produce values your tests can use:

import pytest
from pathlib import Path

@pytest.fixture
def tmp_config(tmp_path):
    cfg = tmp_path / "config.toml"
    cfg.write_text('region = "us-east-1"\n')
    return cfg

def test_loads_config(tmp_config):
    config = load_config(tmp_config)
    assert config["region"] == "us-east-1"

tmp_path is a built-in fixture that gives each test its own throwaway directory. Pytest ships with dozens of these.

Mocking

Don't hit AWS in unit tests. unittest.mock patches functions and methods inline:

from unittest.mock import patch, MagicMock

def test_cleanup_dry_run():
    fake_ec2 = MagicMock()
    fake_ec2.get_paginator.return_value.paginate.return_value = [
        {"Snapshots": [{"SnapshotId": "snap-1", "StartTime": old_date}]},
    ]

    with patch("myscript.boto3.client", return_value=fake_ec2):
        deleted = cleanup_old_snapshots(max_age_days=1, dry_run=True)

    assert deleted == 1
    fake_ec2.delete_snapshot.assert_not_called()

For AWS specifically, moto simulates the actual AWS APIs in-process — far more realistic than hand-rolled mocks.

Test Layout

myproject/
  src/
    myproject/
      __init__.py
      cli.py
      services.py
  tests/
    __init__.py
    test_services.py
    conftest.py     # shared fixtures
  pyproject.toml

Run from the project root:

pytest -v --cov=myproject

--cov requires pytest-cov and reports test coverage. Aim for high coverage on business logic; integration tests cover the rest.

What to Test

Pure logic (parsing, transformations, decisions) — easy and high-value
Edge cases: empty input, max sizes, retries, timeouts
Each error branch — assert the right exception is raised
Critical interactions with cloud APIs via moto / mocks
End-to-end smoke tests in CI against a sandbox account

Don't test the standard library or third-party SDKs themselves — test your code's behaviour around them.