Designing Real SystemsLesson 5.2

How to design a rate limiter

token bucket algorithm, leaky bucket, fixed window, sliding window, distributed rate limiting, Redis atomic operations, rate limit headers

Rate Limiting Algorithms

Rate limiters protect your API from abuse and ensure fair usage. The algorithm choice affects how burst traffic is handled.

Token Bucket

Each user gets a bucket with N tokens. Each request consumes one token. Tokens refill at a fixed rate. Allows bursting up to bucket capacity.

# Redis token bucket implementation
def is_allowed(user_id, rate_per_second, burst):
    key = f'rate:{user_id}'
    now = time.time()
    data = redis.hgetall(key)

    tokens = float(data.get('tokens', burst))
    last_refill = float(data.get('last_refill', now))

    elapsed = now - last_refill
    tokens = min(burst, tokens + elapsed * rate_per_second)

    if tokens >= 1:
        redis.hmset(key, {'tokens': tokens - 1, 'last_refill': now})
        return True
    return False

Sliding Window Counter

Track requests per user in the last N seconds. More accurate than fixed window (which allows 2x burst at window boundary).

# Redis sorted set: score = timestamp
ZADD user:123:requests {timestamp} {request_id}
ZREMRANGEBYSCORE user:123:requests 0 {timestamp - 60}
count = ZCARD user:123:requests
if count > limit: reject

Distributed Rate Limiting

With multiple API servers, use Redis as the shared counter store. Use Lua scripts for atomic check-and-increment to avoid race conditions.