Voice Dictation11 min read

Rota AI: How Round Robin Scheduling Saves My API Rates

Karthik Krishnan
Karthik KrishnanFounder
April 11, 202611 min read

How Round Robin Scheduling Saves My API Rates

TL;DR: Round robin is a scheduling algorithm from operating systems class. It cycles through a list of options one by one, giving each one a fair turn. I used it to rotate across multiple API keys in Rota AI so we never hit rate limits. Instead of one key getting hammered, the load spreads evenly. Here is how it works, why it matters, and how to set it up yourself.


The Problem: API Rates Are Brutal

Let me tell you about the week I almost gave up on Rota AI.

We were in beta. A few hundred users were dictating like crazy. And then the rate limit hit. 429 errors everywhere. Users were getting "too many requests" in the middle of their sentences. Imagine talking to your computer and it just... stops. Because the API said no.

I was using a single API key. One key, one rate limit, one point of failure. And we were blowing through it every few hours.

The obvious solution was to upgrade to a higher tier. More requests per minute, more tokens per day. But here is the thing about API pricing: it scales fast. The tier we needed was not double the cost. It was 10x. That was not happening for a bootstrapped side project.

So I did what any broke CS student would do. I thought about it differently.


The Lightbulb: OS Class Actually Mattered

I was sitting in my operating systems lecture when the professor explained round robin scheduling. You know the one. It is the CPU scheduling algorithm where each process gets a fixed time slot. If your slot is up, the CPU moves to the next process. Fair, predictable, no one starves.

And I remember thinking. Wait. That is exactly what I need.

Instead of one API key handling every request, what if I had multiple keys and rotated through them? Key A takes request 1. Key B takes request 2. Key C takes request 3. Then back to A. Each key gets a break between turns. No single key gets hammered. The rate limits stretch further.

I was literally taking notes on round robin but thinking about API keys. My professor would have been concerned.

But it worked. And honestly, it worked better than I expected.


How Round Robin Works with API Keys

Let me break it down simply.

Say you have 3 API keys. Without round robin, every request goes to Key 1 until it hits its limit. Then you switch to Key 2. Then Key 3. By the time Key 3 is exhausted, users have been dealing with errors for a while.

With round robin, the rotation looks like this:

  1. Request 1 goes to Key A
  2. Request 2 goes to Key B
  3. Request 3 goes to Key C
  4. Request 4 goes to Key A again
  5. Request 5 goes to Key B
  6. And so on...

Each key only handles one out of every three requests. That means each key stays well under its individual rate limit. The effective rate limit of your whole system becomes roughly the sum of all your keys' limits.

Three keys with 60 requests per minute each? That is 180 requests per minute for your app. Without paying for the highest tier. Without begging the API provider for a limit increase.

TBH, it felt like cheating. But it is just math.


How I Built It in Rota AI

The implementation was simpler than I expected. Here is the basic idea.

First, you store your API keys in a list. In our case, we keep them in a config file that gets loaded at startup. Each key is just a string. Nothing fancy.

Then you keep a counter. An index that tracks which key to use next. When a request comes in, you grab the key at the current index, send the request, and increment the index. When the index reaches the end of the list, it wraps back to zero.

That is the whole algorithm. It is literally 10 lines of code.

Here is a simplified version of what it looks like:

keys = [key_a, key_b, key_c]
index = 0

def get_next_key():
    global index
    key = keys[index]
    index = (index + 1) % len(keys)
    return key

Every time Rota AI needs to make an API call, it calls get_next_key(). The key rotates automatically. No key gets used twice in a row. The load distributes evenly.

The modulo operator (%) is what makes it wrap around. When index hits 3 (past the last key), 3 % 3 = 0, so it goes back to the first key. Clean and simple.


Why This Works Better Than Random

You might be thinking: why not just pick a random key each time? That would spread the load too, right?

It would. But random has problems that round robin does not.

With random, you can get unlucky. Maybe Key A gets picked five times in a row. Maybe Key C does not get picked for a minute. Over time it evens out, but in the short term, you can still hit rate limits on individual keys.

Round robin guarantees even distribution. Every key gets exactly the same number of requests (plus or minus one). There is no randomness, no clustering, no bad luck. It is deterministic and predictable.

I tried both approaches in testing. Random worked ok. Round robin worked perfectly. The difference was especially noticeable during burst traffic, like when a user pastes a long paragraph and the app sends multiple chunks at once. Round robin handled it smoothly. Random occasionally tripped a limit.

your mileage may vary depending on your traffic patterns, but for us, round robin was the clear winner.


Handling Key Failures Gracefully

Here is where it gets interesting. What happens when one of your keys hits its limit anyway? Maybe the API provider changed their limits. Maybe a key got revoked. Maybe you just underestimated your traffic.

In Rota AI, we handle this with a simple retry with next key logic. If a request fails with a 429 (rate limited), the system does not just give up. It grabs the next key in the rotation and retries the request immediately. The user never sees the error.

Here is how it works:

  1. Request comes in, gets assigned Key B
  2. Key B returns 429 (rate limited)
  3. System automatically retries with Key C
  4. Key C succeeds, response goes back to the user
  5. Key B gets flagged temporarily and skipped for the next few rotations

That last part is important. When a key fails, we do not just keep hammering it. We mark it as "cooling down" and skip it for a bit. This prevents a cascading failure where one exhausted key keeps getting tried and failing.

The cool down period is configurable. We set it to 60 seconds by default. After that, the key goes back into rotation. If it fails again, the cool down doubles. Exponential backoff, just like you would use for any retry logic.

this failure handling is what makes the whole system production ready. The round robin rotation is the foundation. The retry logic is the safety net.


How Many Keys Do You Actually Need?

This depends entirely on your traffic and the API provider's rate limits. But here is a rough way to figure it out.

Take your peak requests per minute. Divide by the per-key rate limit. That gives you the minimum number of keys. Then add one or two for buffer.

For example, if your peak is 180 requests per minute and each key allows 60 per minute, you need at least 3 keys. Add a buffer, and you want 4 or 5.

In Rota AI, we found that 3 keys handled our beta traffic comfortably. When we scaled up, we added more. The beauty of round robin is that adding keys is trivial. You just append to the list. No code changes needed.

Some users run Rota AI with 5 or 6 keys during heavy usage days. Others get by with 2. It depends on how much you dictate and how strict your API provider's limits are.


The Cost Savings Are Real

Let me put this in perspective.

Without round robin, we would have needed the highest tier API plan to handle our beta traffic. That was $200/month. With round robin and 3 standard keys at $20/month each, we paid $60/month total. Same capacity, one third the cost.

And it scales linearly. Need more capacity? Add another key. You are not jumping to a higher tier with a massive price increase. You are just adding another $20 key to the rotation.

For indie developers and small teams, this is huge. API costs can eat your budget alive. Round robin is one of those tricks that feels obvious in hindsight but saves you real money.


Beyond API Keys: Other Uses for Round Robin

Once I had the round robin system in place, I started finding other places to use it.

Multiple API providers. What if you do not want to rely on one provider? You can rotate between OpenAI, Anthropic, and a local model. Round robin handles the distribution. If one provider goes down, the retry logic kicks in and routes to the next one.

Load balancing across servers. If you are self hosting your own models, round robin distributes requests across your GPU machines. Same principle, different application.

Webhook delivery retries. When sending webhooks to user endpoints, round robin can rotate through retry attempts with different backoff timings.

The pattern is versatile. Once you see it, you start seeing it everywhere. It is one of those CS fundamentals that actually shows up in real world engineering.


Setting It Up Yourself

If you are building an app that hits API rate limits, here is what I would do.

First, get multiple API keys. Most providers let you create several keys on the same account. Some even give you different rate limits per key. Check your provider's docs.

Second, implement the rotation. The code is simple. A list, a counter, a modulo. You can build it in any language in under 30 minutes.

Third, add retry logic. When a key fails, try the next one. Flag exhausted keys temporarily. This is what separates a prototype from something you can trust in production.

Fourth, monitor your usage. Track how many requests each key handles. If one key is consistently hitting limits, you need more keys or you need to reduce traffic.

Fifth, automate key rotation. When a key expires or gets revoked, remove it from the list automatically. Do not let a dead key sit in your rotation causing failed requests.


The Bigger Picture

Here is what I find funny about all this. I spent a whole semester learning about CPU scheduling algorithms. Processes, time slices, context switches, priority queues. I thought it was abstract theory that would never matter in my actual work.

And then I used round robin to solve a real production problem in my own app. The same algorithm that decides which process gets the CPU next is the one that decides which API key handles your dictation request.

CS fundamentals matter. Even the stuff that feels theoretical. Especially the stuff that feels theoretical. You never know when an OS lecture is going to save your API bill.

Round robin is not fancy. It is not cutting edge. It is a decades old algorithm that solves a simple problem really well. And sometimes, that is exactly what you need.


TL;DR

Round robin scheduling rotates requests across multiple API keys one by one. Each key gets a fair turn, so no single key hits its rate limit. The effective capacity of your app becomes the sum of all your keys' limits. It is simple to implement (a list, a counter, a modulo), it saves real money on API costs, and it is more predictable than random selection. Add retry logic with cool down periods and you have a production ready system. Sometimes the best solutions come from your OS homework.

About the Author
Karthik Krishnan
Karthik KrishnanFounder

Founder & Developer

I built Rota because I didn't have $15 to pay for a dictation tool per month, so I built my own.

Related articles