Just say it

One sentence. Done.

“What's on my screen?” “Click the Export button” “What's the weather in Tokyo?” “Play my focus playlist” “Read this PDF and summarize it” “Move this window to the left half” “Record a macro called inbox zero” “What's on my screen?” “Click the Export button” “What's the weather in Tokyo?” “Play my focus playlist” “Read this PDF and summarize it” “Move this window to the left half” “Record a macro called inbox zero”

“Find files named invoice in Downloads” “Remember my project lives in ~/Code” “Use Codex to refactor this file” “What's on my calendar today?” “Type a reply: sounds good, ship it” “Search YouTube for lo-fi beats” “Run my morning setup macro” “Find files named invoice in Downloads” “Remember my project lives in ~/Code” “Use Codex to refactor this file” “What's on my calendar today?” “Type a reply: sounds good, ship it” “Search YouTube for lo-fi beats” “Run my morning setup macro”

Why it's different

It doesn't just answer.
It acts.

Chatbots talk. Siri shrugs. Cursor Voice reads the actual pixels in front of you and operates your Mac like a careful pair of hands.

Sees your screen

Screenshots on demand, OCR with Apple Vision, and a fresh capture after every action so it can verify what it just did — and self-correct when something didn't take.

Clicks by name, not by guess

It targets real UI elements through the macOS Accessibility tree first, falls back to on-screen text, and only simulates the mouse as a last resort.

Types & dictates

Dictate into any field, draft replies, fill forms — it types where your cursor is.

Live web answers

Searches and reads pages in real time instead of guessing from stale training data.

Remembers you

Durable local memory across sessions — your apps, paths, and habits. On your disk, not a server.

Extend it with plugins

A plugin is one small JSON file. Install community tools in a click from the marketplace — or publish your own and it ships after an automatic safety review.

Teach it macros

Show it a routine once — “record a macro called morning setup” — and replay it forever with one sentence. Your skills, saved on your Mac.

Costs pennies, shows the meter

You bring your own OpenAI key and watch spend live under the orb — session cost, credit remaining, no subscription, no markup.

Built for hands-free

Vision-assist describes the screen aloud for low-vision use; hands-free mode runs the whole Mac without touching mouse or keyboard.

How it works

Hotkey. Speak. Done.

Summon the orb

A glowing orb pops up next to your cursor — in any app, over anything. Or just say “Hey Cursor.”

Say what you want

Natural language, no commands to memorize. It looks at your screen for context, asks if it's unsure, and you can interrupt it mid-sentence.

Watch it work

It clicks, types, opens, searches — narrating just enough — then verifies the result with a fresh look at the screen.

The honest comparison

“Hey Siri” can't do this.

Siri sets timers. Cursor Voice operates your Mac. Full comparison →

Capability

Cursor Voice

Siri

Sees what's on your screen

✓

—

Clicks & types in any app

✓

—

Holds a real conversation

✓

—

Live web answers with sources

✓

limited

Extensible with plugins

✓

—

Open source — audit every line

✓

—

Install

Running in under a minute.

Pick your flavor. Paste your OpenAI key in Settings, grant permissions once, and start talking.

$ curl -fsSL https://raw.githubusercontent.com/cursorvoice/cursor-voice/main/install.sh | bash

Downloads the latest release, installs to /Applications, and launches it. That's it.

$ brew install --cask cursorvoice/cursor-voice/cursor-voice

$ brew upgrade --cask cursor-voice # update later

The cask clears quarantine automatically — no right-click-Open dance.

Prefer dragging it in yourself?

Download the DMG

First launch: right-click the app → Open (it's self-signed, not notarized — and fully open source, so you can read exactly what it does).

It asks for mic, screen recording & accessibility — that's the whole point: hear you, see the screen, act for you. Granted once, kept across updates. A guided checklist walks you through it.

Questions

Fair questions, straight answers.

What does it cost to run?

The app is free and open source. You bring your own OpenAI API key and pay OpenAI directly per use — typical sessions cost a few cents. A live cost meter under the orb and a Usage tab show exactly what you're spending, and you can set a credit budget so there are no surprises.

Is my screen being streamed somewhere?

No. Screenshots are taken on demand — when you ask something that needs eyes — and go directly from your Mac to OpenAI's API over your own key. There's no middle server, no account with us, and nothing is stored off your machine. Memory lives in a local file you can read and delete. Privacy policy.

Why does it need accessibility & screen recording?

Screen recording lets it see what you see; accessibility lets it click buttons and type the way assistive tech does. Both are standard macOS permissions you grant once. Risky shell commands are blocked by default, there's a dry-run mode that narrates instead of acting, and the whole codebase is on GitHub if you want to verify any of this.

Can it really interrupt / be interrupted?

Yes — it streams audio both ways over the OpenAI Realtime API, so you can cut it off mid-sentence and it stops and listens. Echo rejection keeps it from interrupting itself on speakers, and a push-to-talk mode is there if you prefer hold-to-speak.

Is this a Siri replacement?

Different league. Siri answers trivia and sets timers; Cursor Voice operates your Mac — it reads the screen, clicks, types, manages files and windows, and holds a conversation while doing it. See the full comparison. (Not affiliated with Apple or OpenAI.)

It's an early beta — what does that mean?

It ships fast and in public — updates land weekly, sometimes daily. Things occasionally miss a click; it takes a fresh screenshot after every action to catch and fix that itself. Found a bug? Open an issue or email support@cursorvoice.app.

Your Mac just learned
to listen.

One sentence. Done.

It doesn't just answer.
It acts.

Sees your screen

Clicks by name, not by guess

Types & dictates

Live web answers

Remembers you

Extend it with plugins

Teach it macros

Costs pennies, shows the meter

Built for hands-free

Hotkey. Speak. Done.

Summon the orb

Say what you want

Watch it work

“Hey Siri” can't do this.

Running in under a minute.

Fair questions, straight answers.

Stop clicking.
Start talking.

One sentence. Done.

It doesn't just answer.It acts.

Sees your screen

Clicks by name, not by guess

Types & dictates

Live web answers

Remembers you

Extend it with plugins

Teach it macros

Costs pennies, shows the meter

Built for hands-free

Hotkey. Speak. Done.

Summon the orb

Say what you want

Watch it work

“Hey Siri” can't do this.

Running in under a minute.

Fair questions, straight answers.

Stop clicking.Start talking.

It doesn't just answer.
It acts.

Stop clicking.
Start talking.