Designing Real SystemsLesson 5.2
How to design a rate limiter
token bucket algorithm, leaky bucket, fixed window, sliding window, distributed rate limiting, Redis atomic operations, rate limit headers
Rate Limiting Algorithms
Rate limiters protect your API from abuse and ensure fair usage. The algorithm choice affects how burst traffic is handled.
Token Bucket
Each user gets a bucket with N tokens. Each request consumes one token. Tokens refill at a fixed rate. Allows bursting up to bucket capacity.
# Redis token bucket implementation
def is_allowed(user_id, rate_per_second, burst):
key = f'rate:{user_id}'
now = time.time()
data = redis.hgetall(key)
tokens = float(data.get('tokens', burst))
last_refill = float(data.get('last_refill', now))
elapsed = now - last_refill
tokens = min(burst, tokens + elapsed * rate_per_second)
if tokens >= 1:
redis.hmset(key, {'tokens': tokens - 1, 'last_refill': now})
return True
return FalseSliding Window Counter
Track requests per user in the last N seconds. More accurate than fixed window (which allows 2x burst at window boundary).
# Redis sorted set: score = timestamp
ZADD user:123:requests {timestamp} {request_id}
ZREMRANGEBYSCORE user:123:requests 0 {timestamp - 60}
count = ZCARD user:123:requests
if count > limit: rejectDistributed Rate Limiting
With multiple API servers, use Redis as the shared counter store. Use Lua scripts for atomic check-and-increment to avoid race conditions.
