Designing a Rate Limiter
Designing a Rate Limiter
Learn how to design a robust rate limiter using the Token Bucket algorithm to protect your APIs from overload and ensure fair resource usage.
A rate limiter protects your API and backend services from excessive requests, preventing overload, resource exhaustion, and potential downtime. It acts as a crucial gatekeeper, ensuring your systems remain stable and responsive under varying traffic loads.
How a Rate Limiter Works
At its core, a rate limiter intercepts incoming requests and decides whether to allow them to proceed to your backend logic or reject them. This prevents a flood of requests, such as from a misconfigured client or a malicious attack, from overwhelming your servers, CPU, and database.
The Token Bucket Algorithm
The Token Bucket algorithm is a widely used method for implementing rate limiting. It operates on the principle of a 'bucket' that holds a certain number of 'tokens'. Each incoming request attempts to consume one token. If a token is available, the request is allowed; otherwise, it is rejected.
Refill Rate and Burst Capacity
The bucket continuously refills with tokens at a steady, predefined rate (e.g., one token per second). This refill rate determines the long-term average rate at which requests are allowed. The bucket also has a maximum capacity, which dictates how many tokens it can hold. This capacity allows for short bursts of traffic, as requests can consume multiple tokens rapidly until the bucket is empty.
For example, if a bucket has a capacity of three tokens and refills at one token per second: 1. A request arrives, consumes a token. Two tokens remain. 2. Two more requests arrive simultaneously, consuming the remaining two tokens. The bucket is now empty. 3. A fourth request arrives immediately. Since no tokens are available, it is rejected. 4. One second passes, and the bucket refills to one token, making it ready for the next allowed request.
function checkAndConsumeToken(bucket):
if bucket.tokens > 0:
bucket.tokens = bucket.tokens - 1
return ALLOWED
else:
return REJECTED
function refillTokens(bucket, refillRate, capacity, lastRefillTime):
currentTime = getCurrentTime()
timeElapsed = currentTime - lastRefillTime
tokensToAdd = timeElapsed * refillRate
bucket.tokens = min(bucket.tokens + tokensToAdd, capacity)
bucket.lastRefillTime = currentTimeDistributed Rate Limiting with Redis
For applications deployed across multiple servers, a simple in-memory token bucket on each server is insufficient. Each server would maintain its own independent limit, leading to inconsistent and potentially higher overall request rates than intended. To enforce a consistent, global rate limit across all instances, a shared, distributed store like Redis is essential. Redis can maintain the state of the token bucket (current token count and last refill time) centrally, ensuring all servers adhere to the same global limit.
Key Takeaways
- Rate limiters protect APIs from overload by controlling incoming request volume.
- The Token Bucket algorithm allows for both steady-state limits and short bursts of traffic.
- Tokens are consumed by requests and refilled at a constant rate up to a maximum capacity.
- Requests are rejected when no tokens are available in the bucket.
- Use a shared store like Redis to implement consistent rate limiting across distributed services.
Got a different question? SeaThru generates a fresh video for any topic where systems talk or data structures move.
Ask your own question →