Skip to content
5 min read·Lesson 8 of 10

APM Platforms: Datadog, New Relic, and Friends

When to buy instead of build. A pragmatic comparison of Datadog, New Relic, Dynatrace, Honeycomb, Splunk, and Elastic Observability.

Open-source observability is powerful but operationally expensive. APM (Application Performance Monitoring) vendors offer the same capabilities as a unified, hosted product. For most companies, the buy-vs-build math favours buying.

What an APM Platform Includes

  • Metrics ingestion (Prometheus-compatible or proprietary)
  • Distributed tracing (often OpenTelemetry-compatible)
  • Log aggregation
  • Real User Monitoring (RUM) — performance from the browser
  • Synthetic monitoring — uptime probes from many regions
  • Infrastructure monitoring (host agent)
  • Alerting and on-call (with PagerDuty/Slack integration)
  • Service maps, dashboards, anomaly detection

The pitch: one agent, one bill, one UI for the whole observability stack.

Datadog

The market leader. Strengths:

  • 500+ integrations — almost any tool you run has a Datadog integration.
  • Strong unified UI: pivot from a metric to a trace to a log seamlessly.
  • Excellent infrastructure and Kubernetes support.
  • Watchdog AI for anomaly detection.
  • Datadog Workflows, CSPM, security monitoring layered on top.

Weaknesses: pricing is the dominant complaint — by host, by container, by ingested GB, by custom metric, with surprising overage bills. Cost engineering is a real activity.

New Relic

Long-time APM vendor. Reset to a "consumption" pricing model (per user + per ingested GB).

  • NRQL — a SQL-like query language across all data types.
  • Strong APM and language agents (especially Java, .NET, Ruby).
  • Includes browser and mobile monitoring.
  • "New Relic One" unifies all features.

Dynatrace

Enterprise focus. Famous for "OneAgent" — a single agent that auto-discovers everything on a host.

  • Davis AI — root-cause analysis based on the topology graph.
  • Strong for traditional Java/.NET enterprise stacks and SAP.
  • Less popular with smaller / cloud-native shops; pricier.

Honeycomb

Different philosophy. Built around "wide events" — every request is one rich record with hundreds of attributes. You ask questions like "p95 latency by build_id and feature_flag, broken out by region, only for users on iOS" and get an answer in seconds.

  • BubbleUp — auto-finds attributes correlated with anomalies.
  • OpenTelemetry-native.
  • Excellent for debugging unknown unknowns in microservices.
  • Smaller integrations footprint — pair with another tool for infra/RUM if needed.

Splunk

Originally a log search company, now a full observability suite (Splunk Observability Cloud after the SignalFx acquisition).

  • Best-in-class log search, especially for security/SIEM use cases.
  • Splunk APM and Infrastructure are credible.
  • Often the right answer in regulated industries already running Splunk for security.

Elastic Observability

The Elastic Stack (Elasticsearch + Kibana) extended into APM.

  • Strong if you already run Elastic for search or logs.
  • OpenTelemetry-compatible.
  • Self-hostable or managed (Elastic Cloud).

Cloud-Native Bundled APM

CloudService
AWSCloudWatch + X-Ray + Application Signals (newer unified APM)
AzureAzure Monitor + Application Insights
GCPCloud Operations Suite (Logging, Monitoring, Trace, Profiler)

Pros: zero ops, IAM-integrated, often the cheapest at small-to-medium scale. Cons: weaker cross-cloud, lock-in, varied UX.

Choosing

If you…Consider
Want one tool that does everythingDatadog
Are on a tight budget but technicalGrafana stack (self-host)
Already on a single cloudThat cloud's native suite
Have complex microservices and need deep debuggingHoneycomb
Run a Java/.NET enterprise stackDynatrace or New Relic
Need security + observability togetherSplunk or Datadog
Already invested in ElasticElastic Observability

Pricing Models, Decoded

  • Per host — Datadog. Predictable for VMs, expensive for thousands of containers (often there is also a container surcharge).
  • Per ingested GB — most log services. Volume control is everything.
  • Per custom metric — Datadog and New Relic surprise people here. Each unique label combination counts.
  • Per user + per GB — New Relic, Honeycomb.
  • Pay-as-you-go cloud — CloudWatch, Azure Monitor.

Always model the bill at 2× current scale. Run a one-month POC with realistic data before committing.

Avoiding Lock-In

OpenTelemetry is your insurance policy. Instrument your code with the OTel SDK. The Collector exports to whichever backend you pay for. If your APM vendor doubles their price, switch in a config change.

Avoid:

  • Vendor-specific SDKs as your primary instrumentation.
  • Heavy use of vendor-specific query languages in alert rules.
  • Storing all dashboards as JSON only in the vendor — keep them in Git too.

Buy or Build?

Most teams under 200 engineers should buy. The hidden cost of self-hosting Prometheus + Loki + Tempo + Alertmanager + Grafana at scale — patching, scaling, on-call for the observability platform itself — usually exceeds the licence fee. Self-host once you are big enough that the licence fee exceeds an SRE team. There is no shame in either direction; pick the one that fits your stage.

Key Takeaways

  • APM = Application Performance Monitoring: metrics + logs + traces + RUM in one tool.
  • Datadog is the unchallenged leader in breadth and integrations.
  • Honeycomb specialises in high-cardinality wide-event analysis for debugging unknowns.
  • Dynatrace and New Relic emphasise auto-discovery and AI-driven baselines.
  • Pricing is per host, per ingested GB, or per custom metric — model it before signing.

Test your knowledge

Try exam-style practice questions to reinforce what you've learned.

Practice Questions →