Per-Tier Quota Enforcement With Redis
This guide builds the runnable path from an incoming API key to an enforced per-tier decision in Redis: resolve the key to a tier, load that tier’s limit, consume an atomic token bucket keyed by (account, tier), check the monthly quota, and emit the headers a client needs. It is the implementation companion to Tiered Access & Quota Enforcement, narrowed to one concrete build you can paste and run. The whole decision is a single Redis round-trip via a token bucket Lua script so it stays correct across stateless nodes.
The problem in concrete numbers
Suppose three plans share one fleet of 6 nodes behind a load balancer: free = 10 rps / 50k per month, pro = 100 rps / 5M per month, enterprise = 1,000 rps / uncapped. A pro key sending a 250-request burst should be allowed (its capacity is 300 tokens) but a free key sending the same burst should be cut at ~20. If the limiter resolved every key to one global bucket, or kept per-node buckets, a single pro key spread over 6 nodes could spend 600 rps — 6× its contract. Keying the bucket by (account, tier) in Redis and consuming atomically holds each account to exactly its tier’s number regardless of which node answers.
Decision table: how to key and enforce
| Choice | Option A | Option B | Use |
|---|---|---|---|
| Bucket key | per API key | per (account, tier) |
(account, tier) so all of an account’s keys share one limit |
| Atomicity | INCR + EXPIRE (two calls) | single Lua script | Lua — no read-modify-write race, one round-trip |
| Rate algorithm | fixed window | token bucket | token bucket — smooth bursts up to burst |
| Quota counter | sliding log | INCR with month TTL | INCR for the gate; sliding log only if it bills (see below) |
| Unknown tier | allow | clamp to smallest | clamp to free — never grant the largest reservoir by accident |
| Redis down | fail-closed | fail-open on rate | fail-open rate, fail-closed quota |
Step-by-step implementation
Build it in this order; each step is independently testable.
- Load tier policies (
rate,burst,quota - Resolve the incoming API key to
{account, tier} - Compute the bucket key
(account, tier)and the quota key(account, billing_month) - Map the returned decision to
200/429/402
1–3. Resolve the key and compute keys
# Python (redis-py). Resolve key -> tier, cache briefly, derive Redis keys.
import time, datetime, redis
r = redis.Redis(host="localhost", port=6379, decode_responses=True)
POLICIES = {
"free": {"rate": 10, "burst": 20, "quota": 50_000},
"pro": {"rate": 100, "burst": 300, "quota": 5_000_000},
"enterprise": {"rate": 1000, "burst": 2000, "quota": -1}, # -1 = uncapped
}
_cache: dict[str, tuple[str, str, float]] = {} # api_key -> (account, tier, exp)
def resolve_tier(api_key: str) -> tuple[str, str]:
hit = _cache.get(api_key)
if hit and hit[2] > time.time():
return hit[0], hit[1]
account, tier = lookup_account(api_key) # your DB / auth-service call
_cache[api_key] = (account, tier, time.time() + 30)
return account, tier
def billing_keys(account: str, tier: str) -> tuple[str, str, int]:
month = datetime.datetime.utcnow().strftime("%Y-%m")
now = datetime.datetime.utcnow()
nxt = datetime.datetime(now.year + (now.month == 12),
(now.month % 12) + 1, 1)
ttl = int((nxt - now).total_seconds()) # seconds to month reset
return f"rl:rate:{account}:{tier}", f"rl:quota:{account}:{month}", ttl
4. Atomic Lua: token bucket + quota in one round-trip
-- KEYS[1]=rate bucket KEYS[2]=quota counter
-- ARGV: cap, rate(tok/s), quota(-1=uncapped), now_ms, quota_ttl_s
-- Returns: {decision, rate_remaining, quota_remaining}
local cap = tonumber(ARGV[1])
local rate = tonumber(ARGV[2])
local quota = tonumber(ARGV[3])
local now = tonumber(ARGV[4])
local qttl = tonumber(ARGV[5])
local b = redis.call('HMGET', KEYS[1], 'tokens', 'ts')
local tokens = tonumber(b[1]) or cap
local ts = tonumber(b[2]) or now
-- refill since last touch, capped at burst capacity
tokens = math.min(cap, tokens + (now - ts) / 1000 * rate)
if tokens < 1 then -- rate exceeded
redis.call('HSET', KEYS[1], 'tokens', tokens, 'ts', now)
redis.call('PEXPIRE', KEYS[1], math.ceil(cap / rate * 1000) + 1000)
return { 'RATE', math.floor(tokens), -1 }
end
local q_remaining = -1
if quota >= 0 then
local used = redis.call('INCR', KEYS[2])
if used == 1 then redis.call('EXPIRE', KEYS[2], qttl) end
if used > quota then -- quota exhausted
return { 'QUOTA', math.floor(tokens), 0 }
end
q_remaining = quota - used
end
tokens = tokens - 1 -- consume one token
redis.call('HSET', KEYS[1], 'tokens', tokens, 'ts', now)
redis.call('PEXPIRE', KEYS[1], math.ceil(cap / rate * 1000) + 1000)
return { 'OK', math.floor(tokens), q_remaining }
5. Call the script and map the decision to headers
ENFORCE = r.register_script(LUA_SOURCE) # LUA_SOURCE = the script above
def enforce(api_key: str) -> dict:
account, tier = resolve_tier(api_key)
p = POLICIES.get(tier, POLICIES["free"]) # unknown tier -> smallest
rate_key, quota_key, qttl = billing_keys(account, tier)
decision, rate_rem, quota_rem = ENFORCE(
keys=[rate_key, quota_key],
args=[p["burst"], p["rate"], p["quota"], int(time.time() * 1000), qttl],
)
return {"decision": decision, "tier": tier,
"rate_remaining": rate_rem, "quota_remaining": quota_rem,
"quota_reset": qttl, "limit": p["rate"]}
# Framework layer (FastAPI-style) mapping the decision to status + headers.
from fastapi import Response, HTTPException
def apply(res: Response, d: dict):
res.headers["RateLimit-Limit"] = str(d["limit"])
res.headers["RateLimit-Remaining"] = str(max(0, d["rate_remaining"]))
if d["quota_remaining"] >= 0:
res.headers["X-Quota-Remaining"] = str(d["quota_remaining"])
res.headers["X-Quota-Reset"] = str(d["quota_reset"])
if d["decision"] == "RATE":
res.headers["Retry-After"] = "1"
raise HTTPException(429, "rate_limited")
if d["decision"] == "QUOTA":
raise HTTPException(402, "quota_exceeded")
Gotchas & edge cases
- Sharing one bucket across an account’s keys. Keying by
(account, tier)means all of an account’s API keys draw from one reservoir — usually what you want for billing fairness. If keys must be independent, scope per key instead; see API key scoping & rate limits. - Quota consumed before the response is sent. The Lua
INCRcharges the quota at admission, so a request that later 5xxs still counted. For metered billing this is wrong — use idempotency keys and reconciliation. - Stale tier on upgrade. A 30 s cache TTL means a freshly upgraded customer stays on the old tier for up to 30 s unless you publish an invalidation event.
- Month boundary race. Two requests crossing midnight on the 1st can land in different
YYYY-MMkeys; this is harmless for gating but matters if you reconcile against billing. PEXPIREkeeps idle buckets cheap. Without the TTL, every key an account ever used lingers in Redis memory.
Verification & testing
# Free key: capacity 20, refill 10 rps. Fire 30 quick requests -> ~20 pass, rest 429.
for i in $(seq 1 30); do
curl -s -o /dev/null -w "%{http_code} " -H "X-API-Key: free_demo" \
https://api.example.com/v1/ping
done; echo
# Aggregate test: 6 workers, one pro key, 3s. Expect ~100 rps accepted TOTAL,
# not 600 — proves the (account, tier) bucket is global, not per-node.
seq 6 | xargs -P6 -I{} sh -c \
'hey -z 3s -c 20 -H "X-API-Key: pro_demo" https://api.example.com/v1/ping' \
| grep -E "Requests/sec|Status code distribution" -A4
Watch the accepted-vs-rejected ratio and Redis call latency while the test runs; wire it up per Prometheus metrics for rate limiting.
Frequently Asked Questions
Why key the bucket by account and tier instead of by API key?
Billing and fairness are per account, not per key. If an account rotates or issues several keys, keying by (account, tier) holds them all to one reservoir. Key per API key only when each key is sold as its own independent limit.
Should the rate check or the quota check run first?
Rate first. It is cheaper and rejects floods before they touch the monthly counter, so a client hammering you in a tight loop never burns quota it would otherwise consume on retries. The Lua script above only increments the quota once the rate bucket has tokens.
How do I reset the monthly quota?
Don't reset it explicitly. Key the counter by account:YYYY-MM and set EXPIRE to seconds-until-month-end on first write. The next month uses a new key that starts at zero, and Redis evicts the old one automatically — no cron sweep.
What status code should an exhausted monthly quota return?
402 Payment Required (or a 403 with a quota_exceeded reason). A spent quota is not a transient condition, so returning 429 wrongly tells clients to retry. Reserve 429 for the per-second rate limit and pair it with Retry-After.
Related
- Tiered Access & Quota Enforcement — the parent topic covering the two-layer model and response contract.
- API Key Scoping & Rate Limits — when to scope per key, scope, or route instead of per account.
- Billing-Critical Sliding-Log Usage — exact counters and idempotency when usage drives invoices.
- Token Bucket Implementation — the algorithm behind the rate axis.
- Redis Counter Architecture — atomic counters and key-expiration patterns.