You can't improve what you can't measure. AWS provides a suite of tools for monitoring, logging, auditing, and governing your cloud environment. These tools are tested heavily in certification exams — and essential in real production environments.
Amazon CloudWatch
CloudWatch is AWS's primary observability service. It collects metrics, logs, and events from virtually every AWS service.
Metrics
Every AWS service publishes metrics to CloudWatch. Examples: EC2 CPU utilisation, RDS connections, S3 request count, Lambda invocations and error rate.
CloudWatch Alarms
Set thresholds on any metric and trigger actions:
- Send SNS notification (email, SMS)
- Trigger Auto Scaling action (scale out/in)
- Stop, reboot, or recover an EC2 instance
CloudWatch Logs
Centralised log storage and querying. Lambda functions, EC2 instances (via CloudWatch Agent), ECS tasks, and API Gateway all send logs here. Use CloudWatch Logs Insights to query logs with SQL-like syntax.
CloudWatch Dashboards
Build custom dashboards with graphs of metrics across multiple services and regions for a single-pane-of-glass view.
CloudWatch Events / EventBridge
React to events from AWS services in near real time. Example: invoke a Lambda function every time an EC2 instance starts, or when an S3 object is uploaded. Amazon EventBridge is the successor, adding support for SaaS event sources.
AWS CloudTrail
CloudTrail records every API call made in your AWS account — who made it, when, from what IP, and whether it succeeded. Think of it as an audit log for your entire AWS environment.
- Enabled by default for 90-day event history in every account
- Configure a Trail to store logs indefinitely in S3
- Send to CloudWatch Logs for real-time alerting on suspicious activity
- Multi-region trails capture activity across all regions
Key exam point: CloudTrail is the answer to "who deleted the S3 bucket?" or "who changed the security group?" questions.
AWS Config
AWS Config records the configuration history of your resources and evaluates them against compliance rules:
- Track what an EC2 instance's security groups looked like at a specific point in time
- Detect when an S3 bucket becomes publicly accessible
- Enforce that all EBS volumes are encrypted
- Alert when resources drift from desired state
Config provides a timeline of configuration changes — essential for compliance frameworks like PCI-DSS, HIPAA, and SOC 2.
AWS Trusted Advisor
Trusted Advisor inspects your AWS environment against best practices across five categories:
- Cost Optimisation: Identify idle or underutilised resources (unused EC2 instances, oversized RDS, unassociated Elastic IPs)
- Performance: Highlight performance improvements (CloudFront caching, EC2 type recommendations)
- Security: Detect open security groups, missing MFA on root, public S3 buckets
- Fault Tolerance: Check for single AZ deployments, missing backups
- Service Quotas: Warn when you're approaching service limits
Basic checks (security and service limits) are free for all accounts. Full access requires a Business or Enterprise support plan.
AWS Systems Manager (SSM)
SSM provides a unified operations interface for managing EC2 instances:
- Session Manager: Browser-based shell access to EC2 instances without SSH or open inbound ports
- Run Command: Execute shell commands across multiple instances at once
- Patch Manager: Automate OS patching across your fleet
- Parameter Store: Secure configuration and secrets storage
- OpsCenter: Centralised view of operational issues
AWS Health Dashboard
The AWS Health Dashboard (formerly Personal Health Dashboard) shows service disruptions and scheduled maintenance events that might affect your specific resources — more personalised than the general AWS Service Status page.
Next: billing and pricing — how AWS charges work and how to control and forecast costs.