Rate Limiting in Python

Here’s the translation of the Go rate limiting example to Python, formatted in Markdown suitable for Hugo:

Rate limiting is an important mechanism for controlling resource utilization and maintaining quality of service. Python supports rate limiting using various libraries and techniques.

First, we’ll look at basic rate limiting. Suppose we want to limit our handling of incoming requests. We’ll use a list to simulate these requests.

import time
from queue import Queue

def main():
    # Simulate incoming requests
    requests = Queue(maxsize=5)
    for i in range(1, 6):
        requests.put(i)

    # This limiter function will return True every 200 milliseconds
    def limiter():
        while True:
            yield
            time.sleep(0.2)

    limit = limiter()

    # By calling next() on the limiter before serving each request,
    # we limit ourselves to 1 request every 200 milliseconds.
    while not requests.empty():
        next(limit)
        req = requests.get()
        print(f"request {req} {time.time()}")

    # We may want to allow short bursts of requests in our rate limiting scheme
    # while preserving the overall rate limit. We can accomplish this by using
    # a token bucket algorithm.
    class TokenBucket:
        def __init__(self, tokens, fill_rate):
            self.capacity = tokens
            self.tokens = tokens
            self.fill_rate = fill_rate
            self.timestamp = time.time()

        def get_token(self):
            now = time.time()
            if self.tokens < self.capacity:
                self.tokens += self.fill_rate * (now - self.timestamp)
            self.timestamp = now
            if self.tokens >= 1:
                self.tokens -= 1
                return True
            return False

    # This bursty_limiter will allow bursts of up to 3 events
    bursty_limiter = TokenBucket(3, 0.2)

    # Now simulate 5 more incoming requests. The first 3 of these will
    # benefit from the burst capability of bursty_limiter.
    bursty_requests = Queue(maxsize=5)
    for i in range(1, 6):
        bursty_requests.put(i)

    while not bursty_requests.empty():
        if bursty_limiter.get_token():
            req = bursty_requests.get()
            print(f"request {req} {time.time()}")
        else:
            time.sleep(0.1)

if __name__ == "__main__":
    main()

Running our program, we see the first batch of requests handled once every ~200 milliseconds as desired.

$ python rate_limiting.py
request 1 1653669421.2034514
request 2 1653669421.4036515
request 3 1653669421.6038516
request 4 1653669421.8040517
request 5 1653669422.0042517

For the second batch of requests, we serve the first 3 immediately because of the burstable rate limiting, then serve the remaining 2 with ~200ms delays each.

request 1 1653669422.2044518
request 2 1653669422.2044518
request 3 1653669422.2044518
request 4 1653669422.4046519
request 5 1653669422.604652

This example demonstrates how to implement basic rate limiting and bursty rate limiting in Python. The time.sleep() function is used to simulate delays, and a custom TokenBucket class is implemented for the bursty rate limiting. In a real-world scenario, you might want to use more sophisticated libraries like ratelimit or aiohttp-ratelimit for more robust rate limiting implementations.