Emitting X-RateLimit Headers Correctly

Q: What Retry-After value should I send for a token bucket?

The time for the next token to refill: ceil((1 - tokens) / refill_rate), clamped to a minimum of 1 second. Do not send the time until a fixed-window boundary because a bucket has no such boundary and that value tells clients to wait far too long.

Q: Should I expose rate-limit headers to unauthenticated clients?

Be cautious. Exact Limit and Remaining values let an abuser tune traffic to stay just under the threshold. A safe pattern is to send the full triplet only to authenticated callers and limit unauthenticated responses to Retry-After on the 429.

Q: Why must headers and metrics come from the same limiter call?

If you call the limiter once to decide allow or deny and again to read the remaining count, a concurrent request can change the count between the two calls, so the header and the recorded metric disagree. Return allowed, remaining, and reset from one atomic Lua script and derive both header and metric from that single result.

Emitting the X-RateLimit-* triplet plus Retry-After sounds trivial until you try to keep four numbers consistent with the limiter decision that produced them — and pick the right Reset encoding so clients don’t retry at the wrong second. This task lives under the rate-limit response headers guide, which covers the field semantics; here the focus is the concrete middleware code that computes and writes the values in Express and FastAPI, and how to verify them with curl -i. The stakes are practical: a client receiving Retry-After: 60 when the real wait is 5 seconds wastes a minute of quota, and a client receiving Retry-After: 0 retries instantly into the same 429.

The problem in concrete numbers

Say you enforce 100 requests per 60-second window per API key with a token bucket (capacity 100, refill ~1.67 tokens/sec). At the moment a key is exhausted, the correct Retry-After is the time for one token to refill — about 1 / 1.67 ≈ 0.6 s, which rounds up to 1. A naive implementation that copies the fixed-window boundary would emit Retry-After: 47 (seconds until the minute rolls over), telling the client to wait 47× too long. Multiply that across thousands of clients and you’ve manufactured a throughput collapse out of a correct limiter. The headers must be derived from the limiter’s own notion of when capacity returns, not from a wall-clock window.

Decision: which Reset encoding to emit

Encoding	`X-RateLimit-Reset` value	Pros	Cons	Use when
Delta-seconds	`57` (seconds from now)	Immune to clock skew; matches `Retry-After`; trivial for clients	Value changes every request (not cacheable)	New APIs; aligns Reset and Retry-After to one unit
Absolute epoch	`1718900000` (Unix seconds)	Stable within a window; matches GitHub convention	Breaks under client/server clock skew; needs `now()` to use	You must match an existing GitHub-style contract
HTTP-date (Retry-After only)	`Mon, 20 Jun 2026 18:13:20 GMT`	Human-readable; valid per RFC 9110	Heaviest to parse; clock-skew-sensitive	Rarely; only if a consumer requires it

The recommendation for a new API: emit delta-seconds for both X-RateLimit-Reset and Retry-After, so the two always agree and clock skew never enters the picture. Document the encoding explicitly; never switch it within an API version.

Step-by-step implementation

Build the header emission in order, deriving every value from one limiter call.

Run the limiter once per request and capture { allowed, limit, remaining, resetSeconds } Run the limiter once per request and capture `{ allowed, limit, remaining, resetSeconds }` from its single decision.
Clamp values: remaining = max(0, remaining), resetSeconds = max(1, ceil(resetSeconds)) Clamp values: `remaining = max(0, remaining)`, `resetSeconds = max(1, ceil(resetSeconds))`.
Set X-RateLimit-Limit and X-RateLimit-Remaining on every Set `X-RateLimit-Limit` and `X-RateLimit-Remaining` on **every** response (success and 429).
Set X-RateLimit-Reset Set `X-RateLimit-Reset` using the documented encoding (delta-seconds recommended).
On a deny, set status 429 and Retry-After to the same delta as Reset On a deny, set status `429` and `Retry-After` to the same delta as `Reset`.
Record the same decision to your metrics counter so header and metric agree.
Decide whether to expose exact limits to all callers or only to authenticated ones (see leakage gotcha below).

Express + ioredis

// Express middleware: emit X-RateLimit-* and Retry-After from one limiter call.
const Redis = require("ioredis");
const redis = new Redis(process.env.REDIS_URL);

const CAPACITY = 100;            // tokens
const REFILL_PER_SEC = 100 / 60; // ~1.67 tokens/sec => 100 per 60s window

// Atomic token-bucket: refill, try-consume, return tokens left. One round-trip.
const BUCKET_LUA = `
local key  = KEYS[1]
local cap  = tonumber(ARGV[1])
local rate = tonumber(ARGV[2])
local now  = tonumber(ARGV[3])   -- ms, from redis TIME for a single clock
local b = redis.call('HMGET', key, 'tokens', 'ts')
local tokens = tonumber(b[1]) or cap
local ts     = tonumber(b[2]) or now
tokens = math.min(cap, tokens + (now - ts) / 1000 * rate)
local allowed = 0
if tokens >= 1 then tokens = tokens - 1; allowed = 1 end
redis.call('HSET', key, 'tokens', tokens, 'ts', now)
redis.call('PEXPIRE', key, math.ceil(cap / rate * 1000) + 1000)
return { allowed, tokens }`;

async function rateLimit(req, res, next) {
  const key = `rl:${req.header("X-API-Key") || req.ip}`;
  // Use Redis server time so reset math uses one clock, not each node's.
  const [sec, micro] = await redis.time();
  const nowMs = Number(sec) * 1000 + Math.floor(Number(micro) / 1000);

  const [allowed, tokensRaw] = await redis.eval(
    BUCKET_LUA, 1, key, CAPACITY, REFILL_PER_SEC, nowMs,
  );
  const tokens = Number(tokensRaw);

  // Derive ALL headers from this one decision.
  const remaining = Math.max(0, Math.floor(tokens));
  // Seconds until the next whole token is available (>=1 so a 429 never says 0).
  const resetSeconds = remaining > 0 ? 0
    : Math.max(1, Math.ceil((1 - tokens) / REFILL_PER_SEC));

  res.set("X-RateLimit-Limit", String(CAPACITY));
  res.set("X-RateLimit-Remaining", String(remaining));
  res.set("X-RateLimit-Reset", String(resetSeconds)); // delta-seconds encoding

  if (allowed === 1) return next();

  res.set("Retry-After", String(resetSeconds)); // same instant as Reset
  res.status(429).json({ error: "rate_limited", retry_after: resetSeconds });
}

module.exports = rateLimit;

FastAPI + redis-py

# FastAPI middleware emitting the same headers from one atomic limiter call.
import math
import redis.asyncio as redis
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.responses import JSONResponse

r = redis.from_url("redis://localhost:6379")

CAPACITY = 100
REFILL_PER_SEC = 100 / 60

BUCKET_LUA = """
local key  = KEYS[1]
local cap  = tonumber(ARGV[1])
local rate = tonumber(ARGV[2])
local now  = tonumber(ARGV[3])
local b = redis.call('HMGET', key, 'tokens', 'ts')
local tokens = tonumber(b[1]) or cap
local ts     = tonumber(b[2]) or now
tokens = math.min(cap, tokens + (now - ts) / 1000 * rate)
local allowed = 0
if tokens >= 1 then tokens = tokens - 1; allowed = 1 end
redis.call('HSET', key, 'tokens', tokens, 'ts', now)
redis.call('PEXPIRE', key, math.ceil(cap / rate * 1000) + 1000)
return {allowed, tostring(tokens)}
"""

class RateLimitMiddleware(BaseHTTPMiddleware):
    def __init__(self, app):
        super().__init__(app)
        self._script = r.register_script(BUCKET_LUA)

    async def dispatch(self, request, call_next):
        key = "rl:" + (request.headers.get("X-API-Key") or request.client.host)
        sec, micro = await r.time()                      # single Redis clock
        now_ms = sec * 1000 + micro // 1000

        allowed, tokens_raw = await self._script(
            keys=[key], args=[CAPACITY, REFILL_PER_SEC, now_ms],
        )
        tokens = float(tokens_raw)
        remaining = max(0, math.floor(tokens))
        reset = 0 if remaining > 0 else max(1, math.ceil((1 - tokens) / REFILL_PER_SEC))

        headers = {
            "X-RateLimit-Limit": str(CAPACITY),
            "X-RateLimit-Remaining": str(remaining),
            "X-RateLimit-Reset": str(reset),  # delta-seconds
        }
        if not allowed:
            headers["Retry-After"] = str(reset)
            return JSONResponse(
                {"error": "rate_limited", "retry_after": reset},
                status_code=429, headers=headers,
            )

        response = await call_next(request)
        for k, v in headers.items():
            response.headers[k] = v
        return response

Gotchas & edge cases

Don’t leak exact limits to abusers. Precise X-RateLimit-Limit and Remaining hand an attacker a map of exactly how hard they can push before tripping. For unauthenticated traffic, consider omitting Remaining entirely or emitting only Retry-After on the 429; reserve the full triplet for authenticated, identified callers who are owed the contract.
Retry-After: 0 is a footgun. A zero tells well-behaved clients to retry immediately, producing a tight 429 loop. Always max(1, …).
Header set after call_next can be missed on streaming responses. In FastAPI, set headers on the response object before the body streams; for already-committed responses you cannot add headers.
Reset must track the limiter, not the wall clock. For a bucket limiter, reset is “time to next token,” not “time to the minute boundary.” Computing the latter is the single most common bug.
Per-node clock skew corrupts epoch encoding. If you must emit absolute epoch, use the Redis server clock (redis.time()), not each node’s local time, or clock skew makes the value wander between nodes.
Two limiter calls = two truths. If you call the limiter once for the allow/deny and again to read remaining, a concurrent request changes the count between calls. Return everything from one atomic script.

Verification & testing

Use curl -i to inspect the raw headers, and drive the key to exhaustion to confirm the 429 path.

# Single request: confirm the triplet is present on a 200.
curl -i -s -H "X-API-Key: acct_42" https://api.example.com/v1/search | grep -i \
  -E "HTTP/|x-ratelimit|retry-after"
# HTTP/2 200
# x-ratelimit-limit: 100
# x-ratelimit-remaining: 99
# x-ratelimit-reset: 0

# Exhaust the bucket, then confirm the 429 carries Retry-After == Reset.
for i in $(seq 1 110); do
  curl -s -o /dev/null -H "X-API-Key: acct_42" https://api.example.com/v1/search
done
curl -i -s -H "X-API-Key: acct_42" https://api.example.com/v1/search | grep -i \
  -E "HTTP/|x-ratelimit-remaining|retry-after"
# HTTP/2 429
# x-ratelimit-remaining: 0
# retry-after: 1

Assert programmatically in a test that on a 429, Retry-After equals X-RateLimit-Reset, both are >= 1, and X-RateLimit-Remaining is 0 — the three invariants most regressions break. Cross-check the accept/reject counts against your Prometheus metrics for rate limiting so the header story and the metric story match.

Frequently Asked Questions

Should X-RateLimit-Reset be epoch or seconds-until-reset?

For a new API, emit delta-seconds (seconds from now) so it matches Retry-After and is immune to clock skew. Use absolute Unix epoch only when you must match an existing GitHub-style contract, and document whichever you choose — never switch encodings within an API version.

What Retry-After value should I send for a token bucket?

The time for the next token to refill: ceil((1 - tokens) / refill_rate), clamped to a minimum of 1 second. Do not send the time until a fixed-window boundary — a bucket has no such boundary, and that value tells clients to wait far too long.

Should I expose rate-limit headers to unauthenticated clients?

Be cautious. Exact Limit and Remaining values let an abuser tune traffic to stay just under the threshold. A safe pattern is to send the full triplet only to authenticated callers and limit unauthenticated responses to Retry-After on the 429.

Why must headers and metrics come from the same limiter call?

If you call the limiter once to decide allow/deny and again to read the remaining count, a concurrent request can change the count between the two calls, so the header and the recorded metric disagree. Return allowed, remaining, and reset from one atomic Lua script and derive both header and metric from that single result.

Rate-Limit Response Headers — the parent guide on header semantics and the gateway/app consistency problem.
RateLimit Draft vs X-RateLimit — emitting the IETF draft headers alongside the legacy triplet.
Retry-After Parsing — how clients parse the Retry-After you emit here.
Prometheus Metrics for Rate Limiting — recording the same decision as a metric.
Token Bucket Implementation — the limiter whose reset math these headers depend on.