AI9 min read

I Tried Voice Coding with Cursor Using Rota AI. Here Is What Happened.

Karthik Krishnan
Karthik KrishnanFounder
April 19, 20269 min read

I Tried Voice Coding with Cursor Using Rota AI. Here Is What Happened.

TL;DR: Talking to code sounds crazy but it works. I wrote about 200 lines of Python by just describing what I wanted out loud. Not faster than typing for most things but great for comments, docstrings, and boilerplate that nobody wants to write by hand.

How This Started

I was sitting at my desk at like midnight, tired from a full day of typing. My fingers ached. I had a feature I needed to build for Rota AI and I just could not bring myself to type another line.

I remembered someone on Twitter talking about voice coding. Dictating code instead of typing it. I thought it sounded ridiculous. But I was too tired to care.

So I opened Cursor, activated Rota AI, and just started talking.

The first thing I said was: write a function that takes a string and returns the number of words.

Cursor wrote it. In Python. Correct indentation and everything.

I stared at my screen for a solid ten seconds.

Then I tried something harder.

The Problem with Voice Code

Here is what nobody tells you about voice coding. The hard part is not the logic. The hard part is punctuation.

Try saying this out loud:

def parse_config_file(path: str) -> dict[str, Any]:

Now try dictating it. "Open parenthesis path colon string close parenthesis arrow dict open bracket string comma any close bracket colon."

It is painful. It is slow. It makes you want to throw your microphone out the window.

camelCase is bad enough. snake_case makes it worse. Now mix in brackets, parentheses, curly braces, and the occasional backslash. Your mouth was not designed for this.

Variable names are another nightmare. Imagine saying "user_auth_token_expiration_timestamp" out loud. Then imagine your voice tool hearing "user auth token expiration time stamp nd" and inserting an extra word into your variable name that breaks everything.

Not fun.

How Rota AI Handles It

This is where Rota AI actually surprised me. When it detects you are in VS Code or Cursor, it changes how it processes text. Instead of just dumping raw transcription, it tries to preserve code syntax.

So when I say "function parse underscore config underscore file," it outputs parse_config_file instead of "parse config file."

The brackets thing? Still imperfect. But Rota AI is smart enough to keep parentheses paired and indentation mostly correct. It knows you are in a code editor and adjusts.

Is it perfect? No. But it is way better than dictating code into Google Docs and copying it over, which is what I used to do. Shoutout to everyone who has suffered through that workflow. You know the pain.

TBH the syntax preservation alone is worth it. It turns voice coding from a joke into something you might actually use.

What Works Well

After a few weeks of experimenting, here is where voice coding actually shines:

Comments. This is the big one. I just say "explain that this function validates the user token and throws an error if it is expired" and a proper comment appears. No stopping to type. No breaking my flow. Just talking through the logic.

Docstrings. Same thing. Describe what the function does out loud and a docstring shows up. It is not going to win a writing award but it is accurate and it took three seconds.

Function descriptions. "Create a helper that reads a JSON file, parses the config, and returns a default if the key is missing." Boom. Function skeleton done. I fill in the gaps manually.

Commit messages. Oh man. I used to write "fix stuff" and "update thing" and "not sure what changed" for my commits. Now I just say "refactor the voice activity detection loop to use a sliding window instead of fixed chunks" and it types that out. My git history looks like a real project now.

Boilerplate. Import statements, class definitions, test file setups. Stuff that takes forever to type but is actually simple to describe. Voice coding crushes this stuff.

Thinking out loud. This is the underrated one. Sometimes I just talk through a problem. "Okay so we need to handle the case where the audio buffer is empty but the user is still talking because of the VAD threshold being too high." Rota AI transcribes that, I clean it up later, but it forces me to articulate the problem clearly. That alone helps me debug faster.

What Does Not Work

Let me be real about the failures.

Complex logic. The moment I try to explain nested conditionals or regex patterns out loud, everything falls apart. "If the token is null OR the expiration timestamp is less than the current time minus the grace period of thirty seconds..." My code reviewer would have questions.

Precise variable naming. I tried to name a variable max_retry_attempts_backoff_multiplier. The voice tool heard "max retry attempts back off multiplier" and I spent ten minutes debugging why my code could not find a variable that did not exist. It was called "backoffmultiplier" all one word. Classic.

Refactoring. "Move this function from line forty seven to after the utility class and rename it to use camelCase instead of snake." Nope. Just do it manually.

Debugging. "The issue is on line eighty three where the pointer dereferences null after the second callback fires asynchronously." My codebase has no idea what I am saying at this point.

Math and calculations. Tried to dictate a mathematical formula once. It was a disaster. Just type your math. Please. For everyone's sanity.

The Workflow

Here is what I actually do now. It is and a hybrid approach.

First, I think out loud. I describe what I want the function to do, what parameters it takes, what it returns. Rota AI transcribes and injects that as a comment above where I am working. Think of it like rubber duck debugging but the duck writes things down.

Then I dictate the broad strokes. "Define a class called TokenValidator. It has an init method that takes a secret key. It has a validate method that takes a token string. It has an is_expired method." The skeleton appears.

Then I switch to typing for the actual logic. Seriously. For the real implementation, my hands go back on the keyboard.

Finally, I go back to voice for cleanup. Write the docstrings, add inline comments, type the commit message. The stuff that takes effort but does not need precision.

The whole cycle is maybe sixty percent typing, forty percent talking. On a good day with lots of boilerplate, it flips to more voice than typing.

your mileage may vary depending on how clear your speech is and how loud your environment is. My roommate walking in mid-function absolutely ruins the flow.

Voice Coding Is Not Replacing Typing

I want to be clear. I am not writing all my code by voice. That would be insane.

But what voice coding does is fill the gaps. The boring parts. The documentation nobody wants to write. The commit messages that currently say "wip" because you could not be bothered.

It also changes how you think about code. When you have to say something out loud before writing it, you tend to have a clearer idea of what you are building. There is something about articulating logic with your actual voice that makes vague ideas concrete.

I write more comments now. By a lot. Because it is effortless to just explain what something does instead of crafting it through keystrokes. My codebase is more documented than it has ever been and I did not have to sacrifice development speed.

That alone makes it worth trying.

FAQ

Does voice coding actually save time?

For writing actual logic, no. You are faster with a keyboard. For comments, docstrings, and boilerplate, yes. It depends on what kind of code you write.

What microphone do I use?

The mikijo blue yeti. But any decent mic works. I tried with laptop and built-in mic and it was terrible. Do yourself favor and get something with noise cancellation.

Does it work with languages other than Python?

I have tried JavaScript and TypeScript. Works about the same. I have not tried C++ but I imagine dictating template syntax would be an exercise in suffering.

What if I have an accent?

I have an Indian accent. Rota AI handles it fine because it uses Whisper under the hood, which is pretty good with accents. did not have to train anything special.

Is this better than GitHub Copilot?

Different thing. Copilot writes code for you based on context. Rota AI gets your spoken words into the editor. They work well together actually. I dictate the plan, Copilot helps fill in the gaps, I type the tricky parts.

Can I use this for pair programming?

Funny you ask. I tried explaining code to a friend over a call while using Rota AI to dictate comments in real time. It worked. Not sure I would call it pair programming but it was close.

Try It

If you are curious, grab Rota AI, fire up Cursor, and try dictating a function out loud. Start with comments. Work your way up to docstrings. See how it feels.

It is not magic. It is and not going to replace your keyboard. But for those midnight coding sessions when your fingers are tired and your brain is still going, it is a pretty solid tool to have in the rotation.

https://github.com/krthik20050/Rota-AI


Built by Karthik Krishnan, a student at Vidya Academy of Science and Technology, Kerala. Still learning. Still building. Still talking to his computer.

About the Author
Karthik Krishnan
Karthik KrishnanFounder

Founder & Developer

I built Rota because I didn't have $15 to pay for a dictation tool per month, so I built my own.

Related articles