Redis Lua vs INCR Rate Limiting
The first design decision when building a Redis counter is whether INCR plus EXPIRE is enough or whether you need a Lua script. This task sits under the Redis Counter Architecture guide, and the answer hinges on one failure: the lost-EXPIRE race, where a counter is incremented but never gets a TTL, so the key lives forever and the window never resets. Plain INCR is correct for a fixed-window counter only if you set the TTL atomically with the increment; the moment you need token-bucket refill or a sliding window, you need a script that reads, computes, and writes inside a single Redis execution.
The problem in concrete numbers
Say you enforce 600 requests/minute per API key with a fixed window: INCR rl:{key}:{minute}, and on the first hit you issue EXPIRE rl:{key}:{minute} 60. Run this as two separate commands at 5,000 RPS and a small fraction of requests β anything where the process crashes, the connection drops, or a failover happens between INCR returning 1 and EXPIRE landing β leaves a key with no TTL. That key now holds a stuck count forever. The next minute uses a new key, so the stale one is invisible to logic but still consumes memory. At a few hundred bytes per orphan and thousands of keys per day, a busy gateway leaks tens of megabytes a week and, worse, a key that loses its EXPIRE within the active window pins the counter so the customer is throttled until the key is manually deleted.
Decision matrix
| Criterion | INCR + EXPIRE (separate) |
INCR/EXPIRE in Lua (EVAL) |
Token bucket / sliding window in Lua |
|---|---|---|---|
| Atomicity | None β gap between commands | Full β one server-side block | Full β read-modify-write in one block |
| Lost-EXPIRE race | Possible (orphan keys) | Impossible | Impossible |
| Round-trips per check | 1β2 | 1 | 1 |
| Algorithm fit | Fixed window only | Fixed window | Token bucket, sliding log, sliding window |
| Read-modify-write safety | Unsafe across clients | N/A (no RMW) | Safe (no interleaving) |
| Script caching | n/a | EVALSHA |
EVALSHA |
| Best fit | Never in production | Simple fixed-window quotas | Burst-smoothing, billing-critical limits |
Selection rules:
- Never ship separate
INCRandEXPIREround-trips in production β the orphan-key risk has no upside over wrapping them in one tiny script. - Use a fixed-window Lua script when a coarse βN per minuteβ quota is acceptable and you can tolerate the boundary burst where two adjacent windows allow up to 2N in a short span.
- Use a token-bucket or sliding-window Lua script when you need burst smoothing or boundary-accurate limits. These require reading state, computing against the clock, and conditionally writing β a read-modify-write that is only race-free inside a single
EVAL.
Why plain INCR cannot do token bucket or sliding window
INCR is atomic by itself, but a token bucket needs to read current tokens, refill based on elapsed time, compare against cost, and write the new token count β four steps. If two clients run those steps interleaved against the same key, both read the same token count and both succeed, double-spending the bucket. The same is true of a sliding window log: you ZREMRANGEBYSCORE old entries, ZCARD to count, then conditionally ZADD β and a concurrent request can slip between the count and the add. Redis executes one Lua script to completion before running anything else (single-threaded execution), which collapses the whole read-modify-write into an indivisible step. That guarantee is the entire reason Lua exists for rate limiting.
Step-by-step implementation
1. Fixed-window counter β INCR + EXPIRE atomically
This is the minimum safe form of INCR-based limiting: the TTL is set in the same script as the increment, so the key can never outlive its window.
-- KEYS[1] = rl:{key}:{minute} ARGV[1] = window seconds ARGV[2] = limit
local n = redis.call('INCR', KEYS[1])
if n == 1 then
redis.call('EXPIRE', KEYS[1], ARGV[1]) -- TTL set atomically with first INCR
end
-- return current count and whether this request is allowed
return { n, (n <= tonumber(ARGV[2])) and 1 or 0 }
// Node.js: run the fixed-window script via EVALSHA, falling back to EVAL.
import Redis from "ioredis";
const redis = new Redis(process.env.REDIS_URL!);
const FIXED_WINDOW = `
local n = redis.call('INCR', KEYS[1])
if n == 1 then redis.call('EXPIRE', KEYS[1], ARGV[1]) end
return { n, (n <= tonumber(ARGV[2])) and 1 or 0 }`;
export async function fixedWindowAllow(key: string, limit = 600, window = 60) {
const minute = Math.floor(Date.now() / 1000 / window);
// ioredis caches the script and uses EVALSHA automatically after the first call.
const [count, allowed] = (await redis.eval(
FIXED_WINDOW, 1, `rl:${key}:${minute}`, String(window), String(limit),
)) as [number, number];
return { ok: allowed === 1, count, remaining: Math.max(0, limit - count) };
}
2. Token bucket β read-modify-write that must be atomic
The bucket refills against the Redis server clock (TIME), so node clock skew never perturbs the result.
-- KEYS[1] = rl:tb:{key}
-- ARGV[1] = capacity ARGV[2] = refill tokens/sec ARGV[3] = cost
local cap = tonumber(ARGV[1])
local rate = tonumber(ARGV[2])
local cost = tonumber(ARGV[3])
local t = redis.call('TIME') -- {seconds, microseconds}
local now = tonumber(t[1]) + tonumber(t[2]) / 1e6
local b = redis.call('HMGET', KEYS[1], 'tokens', 'ts')
local tokens = tonumber(b[1]) or cap
local ts = tonumber(b[2]) or now
tokens = math.min(cap, tokens + (now - ts) * rate) -- refill since last touch
local allowed = 0
if tokens >= cost then tokens = tokens - cost; allowed = 1 end
redis.call('HSET', KEYS[1], 'tokens', tokens, 'ts', now)
redis.call('PEXPIRE', KEYS[1], math.ceil(cap / rate * 1000) + 1000)
return { allowed, math.floor(tokens) }
// Single round-trip token bucket; no read-modify-write race because it is one EVAL.
const TOKEN_BUCKET = `... the Lua above ...`;
export async function tokenBucketAllow(key: string, cap = 100, rate = 50, cost = 1) {
const [allowed, remaining] = (await redis.eval(
TOKEN_BUCKET, 1, `rl:tb:${key}`, String(cap), String(rate), String(cost),
)) as [number, number];
return { ok: allowed === 1, remaining };
}
Operator checklist
- No production path issues
INCRandEXPIRE - Scripts loaded once and invoked via
EVALSHA, withEVALfallback onNOSCRIPT - Token-bucket and sliding-window scripts use
redis.call('TIME') - Keys carry a hash tag (
rl:tb:{acct_42} -
evicted_keysand a periodic scan for TTL-lessrl:*
Gotchas & edge cases
- An orphan key inside the window pins the count. A lost EXPIRE on a fresh window key throttles that customer until manual deletion β far worse than a slow memory leak. This is the strongest reason to never split the commands.
EVALarguments are strings. Redis passes everyARGVas a string;tonumber()everything inside Lua or comparisons silently fail.math.random()in scripts breaks replication on old Redis. On Redis < 5 useredis.replicate_commands()or derive uniqueness fromTIMEand a counter instead of RNG in sorted-set members.- Big scripts block the event loop. Redis runs Lua to completion single-threaded; keep scripts O(1) or O(log n). A sliding-log
ZREMRANGEBYSCOREover a huge set can stall every other client. EVALSHAcache is flushed on restart/failover. Always keep theEVALfallback path or the first call after a failover errors withNOSCRIPT.
Verification & testing
Drive concurrent load against one key and confirm the accepted total never exceeds the limit, and that no key survives without a TTL.
# Hammer one key, then check that every rate-limit key has a TTL (>= 0, never -1).
hey -z 5s -c 40 -H "X-API-Key: acct_42" https://api.example.com/v1/search
redis-cli --scan --pattern 'rl:*' | while read k; do
ttl=$(redis-cli ttl "$k")
[ "$ttl" = "-1" ] && echo "ORPHAN (no TTL): $k" # any output here is a lost-EXPIRE bug
done
See Prometheus metrics for rate limiting to alert on orphan-key growth and script latency continuously.
Frequently Asked Questions
Is plain INCR ever safe for rate limiting?
Only if the EXPIRE is set atomically with the increment β which in practice means wrapping both in a one-line Lua script. A bare INCR followed by a separate EXPIRE round-trip can lose the TTL and leak orphan keys, so it is never safe in production.
Why can't I build a token bucket with INCR and DECR?
A token bucket must read the current tokens, refill them based on elapsed time, compare against cost, and write the result β a read-modify-write. Two concurrent clients running those steps interleaved both read the same count and both succeed, double-spending the bucket. Only a single EVAL makes the sequence indivisible.
Does EVAL add latency over INCR?
Negligibly. A cached script invoked via EVALSHA is one round-trip β the same as or fewer than INCR plus a separate EXPIRE. The script body runs in microseconds inside Redis. You trade essentially nothing for full atomicity.
Should I use Redis server time or the node's clock in the script?
Use redis.call('TIME') inside the script so every node shares one clock. Passing each node's Date.now() makes refill and window math sensitive to clock skew between application servers, which silently distorts limits.
Related
- Redis Counter Architecture β the parent guide on key schemas, data structures, and cluster topology.
- Fixed Window Counter Drift Explained β the boundary burst a fixed-window INCR allows.
- Token Bucket Implementation β the algorithm whose refill needs an atomic script.
- Configuring Express-Rate-Limit with Redis β the default store that uses fixed-window INCR under the hood.