A native macOS assistant that lives next to your cursor. Press a hotkey, speak, and it reads your screen, clicks and types for you, and answers back in voice — powered by the OpenAI Realtime API.
macOS 14+ · Apple Silicon · your own API key
No window to switch to, no copy-paste. The orb appears at your cursor, listens, and acts on whatever's in front of you.
Streams your voice to the OpenAI Realtime API and talks back. Interrupt it mid-sentence — it stops cleanly and listens.
Captures the display so it can answer about what you're looking at — and verify its own actions with a fresh screenshot after each one.
Clicks buttons by name through the Accessibility tree, types, scrolls, runs AppleScript and shell. Mouse simulation only as a last resort.
When a target isn't in the Accessibility tree, it OCRs the screen with the Vision framework and clicks the text directly.
Searches the web and reads pages for current information instead of guessing from stale training data.
Keeps durable facts — your preferred apps, project paths, common tasks — in local memory across sessions.
Press the hotkey (default ⌃⌥/) or say the wake word. The orb materializes right at your cursor.
Ask for anything — "what's this error", "open my downloads", "search for X and summarize it". It captures the screen if it needs context.
It picks the most reliable path — Accessibility click, OCR, AppleScript, a keyboard shortcut — and a halo follows the cursor while it works.
After each action it takes a fresh screenshot and checks the result actually happened before moving on. Then it tells you it's done.
Most assistants guess pixel coordinates from a screenshot and miss a lot. Cursor Voice doesn't guess — it targets real UI, and falls back through layers until something lands.
Reads the app's real controls and fires the button's own action directly — no mouse simulation, no coordinate math. Pixel-perfect on native macOS UI.
If a target isn't in the accessibility tree, it reads the screen with the Vision framework and clicks the text — covering web pages, Electron, anything with a label.
After every action it takes a fresh screenshot and checks the result actually happened. If the UI didn't change, it re-locates and tries again instead of plowing ahead.
The model chooses the most direct path for each task. Clicking real UI is preferred over guessing pixels.
click_elementAXPress, no mouse simclick_textVision OCRsee_screennative capturebatch_actionsmulti-step in one callweb_search+ fetch_urlrun_applescriptrun_shellguardedtype_text· hotkey · scrollopen_urlremember· recallIt's ad-hoc signed (no paid Apple Developer ID), so the installer and Homebrew cask strip the Gatekeeper quarantine for you.
Prefer to drag it in yourself? Grab the latest release DMG and right-click → Open on first launch.
Once it's installed, new versions land through the in-app updater — a banner in Settings, one click to install and relaunch.
It asks for Microphone, Screen Recording, and Accessibility — each unlocks one capability. Grant them, then relaunch.
It works, and it's genuinely useful — but it's a project in active development, not a finished product. Here's the honest state of things so you know what you're getting.
Clicking is good, not perfect. Accessibility-based clicks and OCR land most of the time. Unlabeled icons and pure-canvas UIs can still trip it up.
Apple Silicon + macOS 14 only. No Intel build, no iOS, no Windows. Built and tested on one machine so far.
It controls your computer. Shell and AppleScript run with your full permissions (destructive commands are blocked). Run it on machines you trust.
Not notarized. No paid Developer ID yet, so Gatekeeper warns on first launch. The installer works around it; you can read every line of the source.
You bring the API key. It uses your own OpenAI Realtime usage — there's no hosted backend and nothing is billed by this project.
Feedback wanted. Bugs, rough edges, and ideas are exactly what this stage is for. Open an issue on GitHub.