Designing a Rate Limiter

Learn how to design a robust rate limiter using the Token Bucket algorithm to protect your APIs from overload and ensure fair resource usage.

Ask your own question

In depth

A rate limiter protects your API and backend services from excessive requests, preventing overload, resource exhaustion, and potential downtime. It acts as a crucial gatekeeper, ensuring your systems remain stable and responsive under varying traffic loads.

How a Rate Limiter Works

At its core, a rate limiter intercepts incoming requests and decides whether to allow them to proceed to your backend logic or reject them. This prevents a flood of requests, such as from a misconfigured client or a malicious attack, from overwhelming your servers, CPU, and database.

The Token Bucket Algorithm

The Token Bucket algorithm is a widely used method for implementing rate limiting. It operates on the principle of a 'bucket' that holds a certain number of 'tokens'. Each incoming request attempts to consume one token. If a token is available, the request is allowed; otherwise, it is rejected.

Refill Rate and Burst Capacity

The bucket continuously refills with tokens at a steady, predefined rate (e.g., one token per second). This refill rate determines the long-term average rate at which requests are allowed. The bucket also has a maximum capacity, which dictates how many tokens it can hold. This capacity allows for short bursts of traffic, as requests can consume multiple tokens rapidly until the bucket is empty.

For example, if a bucket has a capacity of three tokens and refills at one token per second: 1. A request arrives, consumes a token. Two tokens remain. 2. Two more requests arrive simultaneously, consuming the remaining two tokens. The bucket is now empty. 3. A fourth request arrives immediately. Since no tokens are available, it is rejected. 4. One second passes, and the bucket refills to one token, making it ready for the next allowed request.

function checkAndConsumeToken(bucket):
  if bucket.tokens > 0:
    bucket.tokens = bucket.tokens - 1
    return ALLOWED
  else:
    return REJECTED

function refillTokens(bucket, refillRate, capacity, lastRefillTime):
  currentTime = getCurrentTime()
  timeElapsed = currentTime - lastRefillTime
  tokensToAdd = timeElapsed * refillRate
  bucket.tokens = min(bucket.tokens + tokensToAdd, capacity)
  bucket.lastRefillTime = currentTime

Distributed Rate Limiting with Redis

For applications deployed across multiple servers, a simple in-memory token bucket on each server is insufficient. Each server would maintain its own independent limit, leading to inconsistent and potentially higher overall request rates than intended. To enforce a consistent, global rate limit across all instances, a shared, distributed store like Redis is essential. Redis can maintain the state of the token bucket (current token count and last refill time) centrally, ensuring all servers adhere to the same global limit.

Key Takeaways

Rate limiters protect APIs from overload by controlling incoming request volume.
The Token Bucket algorithm allows for both steady-state limits and short bursts of traffic.
Tokens are consumed by requests and refilled at a constant rate up to a maximum capacity.
Requests are rejected when no tokens are available in the bucket.
Use a shared store like Redis to implement consistent rate limiting across distributed services.

Got a different question? SeaThru generates a fresh video for any topic where systems talk or data structures move.

Ask your own question →