Rate Limiting

Our example will demonstrate rate limiting. Rate limiting is an important mechanism for controlling resource utilization and maintaining quality of service. Let’s implement it using Java.

First, we’ll look at basic rate limiting. Suppose we want to limit our handling of incoming requests. We’ll serve these requests off a queue.

import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.TimeUnit;
import java.time.Instant;

public class RateLimiting {

    public static void main(String[] args) throws InterruptedException {
        BlockingQueue<Integer> requests = new ArrayBlockingQueue<>(5);
        for (int i = 1; i <= 5; i++) {
            requests.add(i);
        }

        // This limiter queue will receive a signal every 200 milliseconds.
        BlockingQueue<Instant> limiter = new ArrayBlockingQueue<>(1);
        new Thread(() -> {
            try {
                while (true) {
                    TimeUnit.MILLISECONDS.sleep(200);
                    limiter.add(Instant.now());
                }
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            }
        }).start();

        // By blocking on a receive from the limiter queue before serving
        // each request, we limit ourselves to 1 request every 200 milliseconds.
        while (!requests.isEmpty()) {
            limiter.take();
            int req = requests.poll();
            System.out.println("request " + req + " " + Instant.now());
        }

        // Allow short bursts of requests while preserving the overall rate limit.
        BlockingQueue<Instant> burstyLimiter = new ArrayBlockingQueue<>(3);
        for (int i = 0; i < 3; i++) {
            burstyLimiter.add(Instant.now());
        }

        // Every 200 milliseconds, we’ll try to add a new value to burstyLimiter, up to its limit of 3.
        new Thread(() -> {
            try {
                while (true) {
                    TimeUnit.MILLISECONDS.sleep(200);
                    if (burstyLimiter.remainingCapacity() > 0) {
                        burstyLimiter.add(Instant.now());
                    }
                }
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            }
        }).start();

        // Simulate 5 more incoming requests.
        BlockingQueue<Integer> burstyRequests = new ArrayBlockingQueue<>(5);
        for (int i = 1; i <= 5; i++) {
            burstyRequests.add(i);
        }
        while (!burstyRequests.isEmpty()) {
            burstyLimiter.take();
            int req = burstyRequests.poll();
            System.out.println("request " + req + " " + Instant.now());
        }
    }
}

Running this program, we see the first batch of requests handled once every ~200 milliseconds as desired.

$ java RateLimiting
request 1 2023-10-01T12:00:00.200Z
request 2 2023-10-01T12:00:00.400Z
request 3 2023-10-01T12:00:00.600Z
request 4 2023-10-01T12:00:00.800Z
request 5 2023-10-01T12:00:01.000Z

For the second batch of requests, we serve the first 3 immediately because of the burstable rate limiting, then serve the remaining 2 with ~200ms delays each.

$ java RateLimiting
request 1 2023-10-01T12:00:01.200Z
request 2 2023-10-01T12:00:01.200Z
request 3 2023-10-01T12:00:01.200Z
request 4 2023-10-01T12:00:01.400Z
request 5 2023-10-01T12:00:01.600Z