Rate-Limit Response Headers
Rate-limit response headers are the contract a limiter writes onto every response so that a client knows its standing without trial and error β and getting them wrong turns a polite limiter into one that clients retry-storm. This guide sits under the Observability & Operations reference and focuses on the synchronous, per-request signal: the exact fields, how their values are computed, and how to keep them consistent when more than one component (an edge gateway and the origin app) can both touch the same response. A correct Retry-After is the single highest-leverage observability change you can ship, because it converts blind client retries into precise, scheduled backoff and directly shrinks the reject volume you would otherwise have to alert on.
There are two header families in the wild. The legacy convention β X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, plus Retry-After β has no formal standard but is what GitHub, Twitter, and most public APIs emit, so client SDKs already parse it. The IETF draft standardizes a RateLimit and RateLimit-Policy header pair with explicit policy semantics. They are not mutually exclusive; the practical migration path is to dual-emit both for a deprecation window.
The header set and its semantics
Every header below is derived from a single limiter decision. The values must agree with each other: if Remaining is 0, the request that produced it was the one that hit the limit, and a Retry-After should accompany the 429.
| Header | Example value | Semantics | When set |
|---|---|---|---|
X-RateLimit-Limit |
100 |
Ceiling for the current window (requests allowed) | Every response (200 and 429) |
X-RateLimit-Remaining |
42 |
Requests left in the current window after this one | Every response |
X-RateLimit-Reset |
1718900000 or 57 |
When the window resets β absolute epoch seconds or delta-seconds (see below) | Every response |
Retry-After |
57 or Mon, 20 Jun 2026 18:13:20 GMT |
Seconds to wait, or an HTTP-date | On 429 (and 503); optional on near-limit 200s |
RateLimit |
limit=100, remaining=42, reset=57 |
IETF draft: current quota state in one structured field | Every response, if adopted |
RateLimit-Policy |
100;w=60 |
IETF draft: the policy (limit + window) being enforced | Every response, if adopted |
The two status codes that carry these headers are 200 (or any success β emit the quota state so clients can self-throttle before they hit the wall) and 429 Too Many Requests (emit Retry-After so the client knows when to come back). A 503 from an overloaded origin may also carry Retry-After, but that is load shedding, not rate limiting, and should be a distinct signal.
Mechanism: how Reset is computed
Reset is the field teams most often get wrong, because there are two incompatible encodings and the spec history is muddy. The legacy X-RateLimit-Reset was popularized as absolute Unix epoch seconds (GitHubβs convention), while Retry-After and the IETF RateLimit reset field are delta-seconds β a relative count from now. Mixing them silently doubles or zeroes a clientβs wait.
For a fixed window limiter, the reset instant is the window boundary: reset_epoch = floor(now / window) * window + window. The delta form is reset_delta = reset_epoch - now. For a token bucket (token bucket), there is no hard boundary; the meaningful reset is when the next token becomes available, i.e. (1 - tokens) / refill_rate seconds, or for a full refill (capacity - tokens) / refill_rate. Emitting a fixed-window-style boundary for a bucket limiter is a common bug β the value is technically present but semantically meaningless, and clients that trust it will retry at the wrong time.
The invariants a correct implementation maintains:
Remainingis monotonic within a window: it only decreases until the window resets (or the bucket refills).Remaining == 0on a response implies the next request may be limited, and a request that is limited (429) carriesRetry-After.Reset(delta) andRetry-After(delta) agree on a 429: both point at the same instant the client can retry.- Across multiple enforced policies (e.g. 100/min and 1000/hour),
Remainingreflects the most-constrained policy, andRateLimit-Policymay list all of them.
Emitting consistently across gateway and app
The hardest operational problem is not computing one header β it is keeping the headers consistent when two components can write them. A typical stack runs an edge gateway (NGINX, Envoy, an API gateway, or a CDN) that enforces a coarse limit, and an origin application that enforces the precise per-key quota. If both write X-RateLimit-*, the client sees whichever wrote last, and the two often disagree.
Three workable policies:
- Single source of truth. Decide that exactly one tier owns the headers. Usually the origin (it holds the authoritative per-key counter), with the gateway forbidden from setting them. Strip any limiter headers the gateway added before the response leaves.
- Most-constrained wins. If both tiers must enforce, have the gateway read the originβs headers and overwrite only when its own limit is tighter (lower
Remaining). This requires the gateway to parse and compare, which most can do via a small script or filter. - Separate namespaces. Emit gateway limits under one prefix and app limits under another only if clients are documented to read both. This is rare and usually more confusing than it is worth.
Whichever you pick, the rule is the same as the Redis counter architecture guarantee for the decision itself: the headers must reflect the authoritative limiter, and the value in Remaining must equal the value your metrics recorded for the same request. Derive header and metric from one limiter call, not two.
Response contract
The headers form a contract clients depend on, so treat changes to them as API changes. The contract a well-behaved API publishes:
- On every response,
X-RateLimit-LimitandX-RateLimit-Remainingare present and reflect the most-constrained active policy. - On a 429,
Retry-Afteris present, is a non-negative integer (delta-seconds) or a valid HTTP-date, and points at the same instant asX-RateLimit-Reset. X-RateLimit-Resetuses one documented encoding for the life of the API version; you do not silently switch epoch β delta.- Values are clamped to non-negative integers;
Remainingnever goes below0, andRetry-Afteris never0on a 429 (round up to1). - Clients are told, in API docs, exactly which headers exist and which encoding
Resetuses β the frontend Retry-After parsing guide shows the parsing side of this contract and why a robust client handles both encodings defensively.
In this area
Two detailed guides build on the mechanics above:
- Emitting X-RateLimit Headers β a step-by-step HowTo for computing and setting the triplet plus
Retry-Afterin Express and FastAPI middleware, withcurl -iverification and the abuse-leakage tradeoff. - RateLimit Draft vs X-RateLimit β a field-by-field comparison of the IETF
RateLimit/RateLimit-Policyheaders against the legacyX-RateLimit-*, with code that dual-emits during migration.
Related
- Observability & Operations β the parent reference covering headers, metrics, and alerting together.
- Emitting X-RateLimit Headers β how to compute and set the headers in middleware.
- RateLimit Draft vs X-RateLimit β IETF draft headers versus the legacy convention.
- Retry-After Parsing β the client side of the contract, parsing date versus seconds.
- Metrics & Instrumentation β recording the same decision as a time series for operators.