Grafana Rate Limit Dashboards

A rate limiter dashboard has one job: let an on-call engineer glance at it during an incident and answer “is the limiter working, and is it blocking the right traffic?” in under ten seconds. This guide builds that dashboard panel by panel, and it assumes you have already emitted the metrics described in Metrics & Instrumentation — every panel here is just PromQL over ratelimit_* series. If you have not instrumented yet, start with Prometheus metrics for rate limiting and come back.

The five panels that matter

For a limiter doing 2,000 rps across 6 pods, these five panels tell you everything an operator needs. More panels than this and the dashboard stops being scannable.

Five-panel rate limiter dashboard layout A dashboard grid: a top row with 429 rate and store-error rate stat panels, a middle row with top limited routes and limiter p99 latency, and a wide utilization heatmap across the bottom. 429 rate / block ratio time series store-error rate stat / threshold top limited routes table / bar gauge limiter p99 latency time series utilization heatmap (remaining / limit) by tier and route over time

Panel-to-PromQL map

This is the dashboard in one table: each panel, the query behind it, and the operational question it answers.

Panel Visualization PromQL What it tells you
429 rate / block ratio Time series sum(rate(ratelimit_requests_total{decision="blocked"}[5m])) / sum(rate(ratelimit_requests_total[5m])) Fraction of traffic being rejected fleet-wide. A spike is usually abuse shed; a drop to zero is suspicious.
Top limited routes Table / bar gauge topk(10, sum by (route, tier) (rate(ratelimit_requests_total{decision="blocked"}[5m]))) Which routes and tiers absorb the most blocks — your hotspots.
Utilization heatmap Heatmap sum by (tier) (ratelimit_remaining) / sum by (tier) (ratelimit_limit) How close each tier runs to its ceiling. Bands near 1.0 mean clients are one bump from mass 429s.
Limiter p99 latency Time series histogram_quantile(0.99, sum by (le, route) (rate(ratelimit_decision_duration_seconds_bucket[5m]))) Tail latency of the decision itself. Climbing p99 is the earliest store-trouble signal.
Store-error rate Stat (thresholded) sum(rate(ratelimit_store_errors_total[5m])) Backing-store failures per second. Any sustained value means the limiter is degrading.

A sixth tile is worth pinning even though it is not strictly a “panel”: a stat showing sum(rate(ratelimit_fail_open_total[5m])) with a red threshold at anything above zero. When the limiter fails open the block-ratio panel goes quiet and looks fine — only this tile reveals that no limiting is happening.

Building the panels

In Grafana, each panel is a query plus a visualization choice. The two that need care are the heatmap and the p99 latency.

For the utilization heatmap, set the visualization to “Heatmap”, point it at the utilization query, and bucket the y-axis 0–1. Reading it: a horizontal band that drifts toward 1.0 over the day is a tier steadily approaching its limit — capacity-planning signal, not an incident yet. A band that slams to 1.0 instantly is a burst.

For limiter p99 latency, the query must aggregate the _bucket series by le before histogram_quantile, never after — aggregating quantiles across instances is mathematically wrong. The sum by (le, route) does this correctly.

Dashboard JSON snippet

A panel is portable as JSON. This is the 429 block-ratio panel; the others follow the same shape with their query swapped in.

{
  "type": "timeseries",
  "title": "429 block ratio (fleet)",
  "datasource": { "type": "prometheus", "uid": "${DS_PROM}" },
  "fieldConfig": {
    "defaults": {
      "unit": "percentunit",
      "thresholds": { "steps": [
        { "color": "green", "value": null },
        { "color": "orange", "value": 0.1 },
        { "color": "red", "value": 0.3 }
      ] }
    }
  },
  "targets": [
    {
      "refId": "A",
      "expr": "sum(rate(ratelimit_requests_total{decision=\"blocked\"}[5m])) / sum(rate(ratelimit_requests_total[5m]))",
      "legendFormat": "block ratio"
    }
  ]
}
{
  "type": "stat",
  "title": "Fail-open rate (must be 0)",
  "datasource": { "type": "prometheus", "uid": "${DS_PROM}" },
  "fieldConfig": {
    "defaults": {
      "unit": "reqps",
      "thresholds": { "steps": [
        { "color": "green", "value": null },
        { "color": "red", "value": 0.001 }
      ] }
    }
  },
  "targets": [
    { "refId": "A", "expr": "sum(rate(ratelimit_fail_open_total[5m]))" }
  ]
}

Use a tier and a route template variable (label_values(ratelimit_requests_total, tier)) so the whole dashboard filters down to one slice without rewriting queries.

Gotchas & edge cases

  • Never histogram_quantile after aggregating quantiles. Aggregate the _bucket counters by le, then take the quantile once. Averaging per-instance p99s produces a number that means nothing.
  • rate() window must exceed scrape interval × ~4. At a 15 s scrape, a [1m] rate has only ~4 samples and is jittery. [5m] is the safe default for dashboards.
  • Block-ratio reads NaN at zero traffic. When the denominator is zero (no requests), the division is NaN. That is correct but ugly; clamp with or vector(0) if it bothers on-call.
  • A quiet 429 panel is not necessarily good. If blocks fall to zero while traffic is steady, check the fail-open and store-error panels before celebrating — the limiter may have stopped enforcing.
  • Heatmap utilization needs both gauges. If you emit ratelimit_remaining but not ratelimit_limit, the ratio breaks. Emit both, or hard-code the limit per tier as a recording rule.
  • Dashboards are not alerts. Nobody watches a screen at 3 a.m. Pair this with the rules in Alerting on 429 error rates.

Frequently Asked Questions

Why is my p99 latency panel wrong after aggregating across pods?

Almost certainly because you computed a quantile per pod and then averaged. Quantiles are not averageable. Aggregate the histogram _bucket series by le first with sum by (le) (rate(..._bucket[5m])), then apply histogram_quantile exactly once.

How do I show which specific API key is being throttled?

You don't, on a dashboard. Metrics are labelled by key_class and route, not by raw key, to keep cardinality bounded. The dashboard tells you "the free tier on /search is being throttled"; to find the exact key, pivot to logs filtered by that route and tier.

What time range and rate window should panels use?

A [5m] rate window over a 6–24 hour dashboard time range works for nearly everything. The rate window should be at least four times the scrape interval so each point has enough samples; at a 15-second scrape that rules out [1m] for steady panels.

Should the dashboard show absolute 429 counts or the ratio?

The ratio is the primary signal because it normalizes for traffic — 500 blocks at 10,000 rps is healthy shedding, while 500 blocks at 800 rps is alarming. Keep absolute rate as a secondary series so you can still see traffic volume, but make decisions on the ratio.