Rate-Limit Response Headers

Rate-limit response headers are the contract a limiter writes onto every response so that a client knows its standing without trial and error β€” and getting them wrong turns a polite limiter into one that clients retry-storm. This guide sits under the Observability & Operations reference and focuses on the synchronous, per-request signal: the exact fields, how their values are computed, and how to keep them consistent when more than one component (an edge gateway and the origin app) can both touch the same response. A correct Retry-After is the single highest-leverage observability change you can ship, because it converts blind client retries into precise, scheduled backoff and directly shrinks the reject volume you would otherwise have to alert on.

There are two header families in the wild. The legacy convention β€” X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, plus Retry-After β€” has no formal standard but is what GitHub, Twitter, and most public APIs emit, so client SDKs already parse it. The IETF draft standardizes a RateLimit and RateLimit-Policy header pair with explicit policy semantics. They are not mutually exclusive; the practical migration path is to dual-emit both for a deprecation window.

The header set and its semantics

Every header below is derived from a single limiter decision. The values must agree with each other: if Remaining is 0, the request that produced it was the one that hit the limit, and a Retry-After should accompany the 429.

Header Example value Semantics When set
X-RateLimit-Limit 100 Ceiling for the current window (requests allowed) Every response (200 and 429)
X-RateLimit-Remaining 42 Requests left in the current window after this one Every response
X-RateLimit-Reset 1718900000 or 57 When the window resets β€” absolute epoch seconds or delta-seconds (see below) Every response
Retry-After 57 or Mon, 20 Jun 2026 18:13:20 GMT Seconds to wait, or an HTTP-date On 429 (and 503); optional on near-limit 200s
RateLimit limit=100, remaining=42, reset=57 IETF draft: current quota state in one structured field Every response, if adopted
RateLimit-Policy 100;w=60 IETF draft: the policy (limit + window) being enforced Every response, if adopted

The two status codes that carry these headers are 200 (or any success β€” emit the quota state so clients can self-throttle before they hit the wall) and 429 Too Many Requests (emit Retry-After so the client knows when to come back). A 503 from an overloaded origin may also carry Retry-After, but that is load shedding, not rate limiting, and should be a distinct signal.

Mechanism: how Reset is computed

Reset is the field teams most often get wrong, because there are two incompatible encodings and the spec history is muddy. The legacy X-RateLimit-Reset was popularized as absolute Unix epoch seconds (GitHub’s convention), while Retry-After and the IETF RateLimit reset field are delta-seconds β€” a relative count from now. Mixing them silently doubles or zeroes a client’s wait.

For a fixed window limiter, the reset instant is the window boundary: reset_epoch = floor(now / window) * window + window. The delta form is reset_delta = reset_epoch - now. For a token bucket (token bucket), there is no hard boundary; the meaningful reset is when the next token becomes available, i.e. (1 - tokens) / refill_rate seconds, or for a full refill (capacity - tokens) / refill_rate. Emitting a fixed-window-style boundary for a bucket limiter is a common bug β€” the value is technically present but semantically meaningless, and clients that trust it will retry at the wrong time.

The invariants a correct implementation maintains:

  • Remaining is monotonic within a window: it only decreases until the window resets (or the bucket refills).
  • Remaining == 0 on a response implies the next request may be limited, and a request that is limited (429) carries Retry-After.
  • Reset (delta) and Retry-After (delta) agree on a 429: both point at the same instant the client can retry.
  • Across multiple enforced policies (e.g. 100/min and 1000/hour), Remaining reflects the most-constrained policy, and RateLimit-Policy may list all of them.
One limiter decision producing a consistent set of response headers A limiter decision feeds limit, remaining, and reset into response headers, with Retry-After added only on a 429 and the two reset encodings shown as epoch versus delta. Limiter decision limit, remaining, reset_at 200 OK headers X-RateLimit-Limit: 100 X-RateLimit-Remaining: 42 X-RateLimit-Reset: 57 429 headers X-RateLimit-Remaining: 0 Retry-After: 57 added only on deny Reset encoding epoch: 1718900000 delta: 57 (now+57s) never mix the two

Emitting consistently across gateway and app

The hardest operational problem is not computing one header β€” it is keeping the headers consistent when two components can write them. A typical stack runs an edge gateway (NGINX, Envoy, an API gateway, or a CDN) that enforces a coarse limit, and an origin application that enforces the precise per-key quota. If both write X-RateLimit-*, the client sees whichever wrote last, and the two often disagree.

Three workable policies:

  1. Single source of truth. Decide that exactly one tier owns the headers. Usually the origin (it holds the authoritative per-key counter), with the gateway forbidden from setting them. Strip any limiter headers the gateway added before the response leaves.
  2. Most-constrained wins. If both tiers must enforce, have the gateway read the origin’s headers and overwrite only when its own limit is tighter (lower Remaining). This requires the gateway to parse and compare, which most can do via a small script or filter.
  3. Separate namespaces. Emit gateway limits under one prefix and app limits under another only if clients are documented to read both. This is rare and usually more confusing than it is worth.

Whichever you pick, the rule is the same as the Redis counter architecture guarantee for the decision itself: the headers must reflect the authoritative limiter, and the value in Remaining must equal the value your metrics recorded for the same request. Derive header and metric from one limiter call, not two.

Response contract

The headers form a contract clients depend on, so treat changes to them as API changes. The contract a well-behaved API publishes:

  • On every response, X-RateLimit-Limit and X-RateLimit-Remaining are present and reflect the most-constrained active policy.
  • On a 429, Retry-After is present, is a non-negative integer (delta-seconds) or a valid HTTP-date, and points at the same instant as X-RateLimit-Reset.
  • X-RateLimit-Reset uses one documented encoding for the life of the API version; you do not silently switch epoch ↔ delta.
  • Values are clamped to non-negative integers; Remaining never goes below 0, and Retry-After is never 0 on a 429 (round up to 1).
  • Clients are told, in API docs, exactly which headers exist and which encoding Reset uses β€” the frontend Retry-After parsing guide shows the parsing side of this contract and why a robust client handles both encodings defensively.

In this area

Two detailed guides build on the mechanics above:

  • Emitting X-RateLimit Headers β€” a step-by-step HowTo for computing and setting the triplet plus Retry-After in Express and FastAPI middleware, with curl -i verification and the abuse-leakage tradeoff.
  • RateLimit Draft vs X-RateLimit β€” a field-by-field comparison of the IETF RateLimit/RateLimit-Policy headers against the legacy X-RateLimit-*, with code that dual-emits during migration.