The Google Cloud Operations Suite (formerly Stackdriver) provides integrated monitoring, logging, tracing, and diagnostics for applications running on GCP, other clouds, or on-premises. A healthy production environment requires visibility across all four signals: metrics, logs, traces, and errors.
Cloud Monitoring
Cloud Monitoring collects and visualises metrics from GCP resources, custom applications, and third-party systems.
Key capabilities:
- Metrics Explorer: Query and chart any metric in real time
- Dashboards: Custom dashboards combining metrics from multiple resources
- Uptime checks: Synthetic monitoring — verify that URLs, TCP ports, or services respond correctly from multiple global locations
- Alerting policies: Define conditions (threshold, rate of change, absence) and notification channels (email, PagerDuty, Slack, Pub/Sub)
- Service Monitoring: SLO/SLI tracking for GKE, Cloud Run, and App Engine services
Cloud Logging
Cloud Logging ingests, stores, and analyses log entries from GCP services and custom applications. Logs are automatically collected from most GCP services — no agent needed for managed services.
Key capabilities:
- Log Explorer: Query logs using Logging Query Language (LQL) — filter by resource, severity, time range, and custom fields
- Log-based metrics: Create metrics from log patterns (e.g., count of 500 errors)
- Log sinks: Export logs to Cloud Storage (archival), BigQuery (analysis), or Pub/Sub (real-time processing)
- Log buckets: Control retention (default 30 days for most logs, configurable)
Cloud Trace
Cloud Trace is a distributed tracing system that tracks request latency across microservices. When a user request flows through multiple services, Trace shows the full call chain and identifies slow segments.
- Automatically integrated with App Engine, Cloud Run, and Cloud Functions
- Instrumented with OpenTelemetry or the Trace client libraries for custom services
Error Reporting
Error Reporting automatically groups and aggregates application exceptions from logs, counts occurrences, and alerts when new errors appear. It supports Node.js, Python, Go, Java, Ruby, .NET, and PHP.
Cloud Profiler
Cloud Profiler continuously analyses CPU and memory usage of your application in production with minimal overhead. It identifies which functions consume the most resources, enabling targeted optimisation.
Cloud Audit Logs
Every admin action in GCP generates an audit log entry. Types:
| Log Type | What It Captures | Enabled By Default |
|---|---|---|
| Admin Activity | API calls that modify configuration (create, delete, modify) | Yes |
| Data Access | API calls that read or write data (e.g., reading a Cloud SQL row) | No (except BigQuery) |
| System Event | Automated GCP system actions (e.g., live migration) | Yes |
| Policy Denied | Requests denied by a VPC firewall or org policy | Yes |
Audit logs are essential for security investigation, compliance (PCI DSS, HIPAA), and forensics. Admin Activity logs cannot be disabled.