Running Infrastructure-as-Code in Pipelines — CI/CD Pipelines | CertQnA

Infrastructure-as-code (Terraform, Bicep, Pulumi, CloudFormation, OpenTofu) belongs in CI just as much as application code. Running it in pipelines makes infra changes reviewable, repeatable, and auditable.

The Plan / Apply Lifecycle

Terraform-style tools split changes into two steps:

Plan — show what would change. Read-only.
Apply — actually make the changes.

The CI pattern follows naturally:

PR opened    → run terraform plan → post diff as a comment
PR merged    → run terraform apply on main

A GitHub Actions Example

name: Terraform

on:
  pull_request:
    paths: [infra/**]
  push:
    branches: [main]
    paths: [infra/**]

permissions:
  id-token: write
  contents: read
  pull-requests: write

jobs:
  plan:
    if: github.event_name == 'pull_request'
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: ./infra
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123:role/tf-plan
          aws-region: us-east-1
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init
      - run: terraform fmt -check
      - run: terraform validate
      - id: plan
        run: terraform plan -no-color -out=tfplan

  apply:
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: production       # gates with manual approval
    defaults:
      run:
        working-directory: ./infra
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123:role/tf-apply
          aws-region: us-east-1
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init
      - run: terraform apply -auto-approve

Note the two distinct IAM roles. The plan role can read everything; the apply role can change things. Even a hijacked PR run cannot mutate infra.

Remote State and Locking

Local .tfstate files do not work for teams. Use a remote backend:

S3 + DynamoDB	State in S3, lock in DynamoDB. Classic AWS setup.
Terraform Cloud / HCP Terraform	SaaS — managed state, runs, policy, team workflows.
Azure Storage	Container with blob lease for locking.
GCS	Native locking on the bucket.
OpenTofu	Same backends; community fork of Terraform.

Locking prevents two pipeline runs from corrupting state when they apply concurrently. Always enable it.

Posting the Plan to the PR

Make the plan readable to reviewers:

- name: Comment plan
  uses: actions/github-script@v7
  with:
    script: |
      const fs = require('fs');
      const plan = fs.readFileSync('infra/plan.txt', 'utf8');
      const body = '### Terraform Plan\n\n\`\`\`\n' + plan + '\n\`\`\`';
      await github.rest.issues.createComment({
        ...context.repo,
        issue_number: context.issue.number,
        body,
      });

Reviewers approve the diff, not just the code. "What will this do to prod?" becomes a one-glance answer.

Policy as Code

Stop bad plans before they're applied. Tools:

OPA / Conftest — write Rego rules ("no public S3 buckets")
Sentinel — Terraform Cloud's policy engine
Checkov, tfsec, Terrascan — pre-built rule packs
Open Policy Agent Gatekeeper — at the Kubernetes admission layer

- name: Run Checkov
  uses: bridgecrewio/checkov-action@v12
  with:
    directory: infra
    framework: terraform
    soft_fail: false      # fail the build on findings

Multi-Environment Layout

Two common patterns:

Workspaces

terraform workspace new dev
terraform workspace new prod
terraform workspace select prod
terraform apply

One config, many state files. Light-touch but discourages env-specific differences.

Per-environment directories

infra/
├── modules/
├── envs/
│   ├── dev/
│   ├── staging/
│   └── prod/

Each env is a top-level config that imports shared modules. Easy to give different envs different shapes; preferred for non-trivial setups.

Drift Detection

Someone clicks something in the cloud console, or an auto-scaling group changes, and your IaC no longer matches reality. Run a scheduled terraform plan and alert if it shows changes:

on:
  schedule:
    - cron: '0 8 * * *'   # daily 8am

jobs:
  drift:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init
      - id: plan
        run: terraform plan -detailed-exitcode -no-color -out=tfplan
        continue-on-error: true
      - if: steps.plan.outputs.exitcode == '2'
        run: |
          echo "Drift detected — alerting Slack"
          curl -X POST $SLACK_WEBHOOK -d '{"text":"⚠️ Terraform drift in prod"}'

Managed IaC Platforms

Once you have many states, tens of pipelines, and several teams, raw GitHub Actions starts to creak. Specialised products take over:

Atlantis — open-source, comments on PRs (atlantis plan, atlantis apply)
Terraform Cloud / HCP Terraform — runs, RBAC, policy, audit
Spacelift — multi-IaC (Terraform + Pulumi + Ansible + CloudFormation)
env0, Scalr — similar managed platforms

They give you queueing, drift detection, RBAC, cost estimates, and audit logs out of the box.

Cost Estimation

Add a "what will this cost?" step to every plan:

- uses: infracost/actions/setup@v3
  with:
    api-key: ${{ secrets.INFRACOST_API_KEY }}
- run: infracost diff --path=infra --format=table

Reviewers see "+\$1,200/month" before approving — much harder to ignore than a lengthy plan diff.

The Pipeline Pattern in One Sentence

Plan on PRs, apply on merge with a separate, more-privileged identity, gate with environment approvals, scan with policy-as-code, and run scheduled drift checks.