RED Metrics: A Better Way to Understand What Your Users Are Experiencing

August 28, 2025
Tags:
Observability
OpenTelemetry

When monitoring applications, most teams start with CPU and memory. These are easy to collect and visualize. But while they help you understand system resource usage, they often fail to tell you what really matters:

👉 What is the user experiencing?

In this post, I’ll walk you through a better way to monitor services using RED metrics, and how they can help you move from system-centric to user-centric observability.

What are RED Metrics?

RED stands for:

  • Rate — how many requests are coming in (throughput)
  • Errors — how many of those requests fail
  • Duration — how long requests take (latency)

These three dimensions provide direct visibility into the behavior and performance of a request-driven system — as seen by your users.

Three graphs. First displays inbound latency (ms). Second displays inbound request rate (req/s). Third displays inbound error rate (%).

Why RED Metrics Matter More Than CPU and Memory

Let’s say your CPU usage is low and memory looks stable. That might seem fine at first glance — but what if:

  • Users are getting 5xx errors?
  • Pages are taking several seconds to load?
  • Your checkout API is experiencing intermittent timeouts?

These are all user-facing problems that won’t show up in CPU or memory dashboards.

On the flip side, your CPU might be spiking — but if request latency is still low and errors are near-zero, your customers may not even notice.

That’s why RED metrics are essential: they focus on what users actually care about — speed and success.

A Practical Example

Let’s take an order service.

You may define two key objectives:

  • Error Rate: Keep it under 5%
  • Latency (P99): Ensure 99% of orders complete in under 50ms

With RED metrics, you can:

  • Set alerts if error rate exceeds 5%
  • Trigger investigations when P99 latency crosses 50ms
  • Drill into specific operations (e.g. DB queries, cache lookups) to find bottlenecks

These signals help your team react before users are impacted — and fix issues faster when they are.

Start Small: Instrument Individual Operations

RED metrics work best when applied to the right level of granularity:

  • Instrument individual operations (e.g. POST /orders, GET /cart)
  • Track duration and errors for specific queries or downstream calls
  • Aggregate them at the service level
  • Then, combine across services to monitor entire user journeys

This layered approach lets you build meaningful Service Level Objectives (SLOs) that reflect real user experience — not just server stats.

Screenshot displaying two monitor alerts that are based on RED metrics.
Type image caption here (optional)Alerts based on RED Metrics

How to Collect RED Metrics with OpenTelemetry

If you’re using OpenTelemetry traces, you already have the data.

The trick is to enable span metrics, which automatically calculate:

  • Request rate
  • Request duration (latency)
  • Error count

You can export these metrics to:

  • Prometheus
  • Any supported Observability platform

Once collected, you can:

  • Create dashboards for service-level SLOs
  • Set alerts for threshold breaches
  • Monitor full user flows across multiple services
💡 Span metrics turn distributed traces into a rich source of RED telemetry data — without extra code changes.

Final Thoughts

If you’re only monitoring infrastructure metrics, you’re flying blind to what really matters.

RED metrics give you a direct window into user experience — they’re the building blocks for effective alerting, debugging, and performance monitoring.

Start by instrumenting a few key operations. Define your thresholds. Set up alerts. Then roll it up into service and journey-level SLOs.

Once you adopt RED metrics, you’ll wonder how you ever operated without them.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Rajith Attapattu
Linked In

Receive blog & product updates

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.