Script Valley
System Design: APIs, Caching & Scalability
Rate Limiting and ThrottlingLesson 4.5

API gateway rate limiting vs application-level rate limiting

API gateway rate limiting, Kong rate limiting plugin, application middleware, latency impact, centralized vs distributed enforcement, per-route limits

API gateway rate limiting vs application-level rate limiting

Gateway vs application rate limiting

Two Enforcement Points

Rate limiting can live at the API gateway before requests reach application code, or inside the application as middleware. Both have trade-offs.

API Gateway Rate Limiting

Gateways such as Kong, AWS API Gateway, and Nginx reject over-limit requests before they hit your application. Configure in Kong:

# Kong rate limiting plugin (declarative YAML)
plugins:
  - name: rate-limiting
    config:
      minute: 100
      hour: 1000
      policy: redis
      redis_host: redis
      redis_port: 6379

Zero application code required. Works uniformly across microservices. Downside: coarse-grained — per-route or per-user-tier limits require additional gateway configuration complexity.

Application-Level Rate Limiting

Middleware inside your service gives full control over different limits per endpoint and per user tier:

// Per-route limits in Express
router.post('/expensive-op', rateLimit({ windowMs: 60000, max: 5 }));
router.get('/cheap-list',    rateLimit({ windowMs: 60000, max: 1000 }));

Better granularity, but every service must implement it correctly. In microservices, prefer gateway limiting for baseline protection and application-level limiting for fine-grained per-route policies on top.