VocaPulse is a GPU-accelerated speech-to-text application for Linux. Press a hotkey, speak, paste clean text into any app. An optional on-device language model turns spoken thoughts into structured writing — prose, bullet points, or Markdown. Nothing is ever uploaded.
The recording overlay appears wherever you are working. Speech recognition runs on your own GPU. An optional second hotkey reshapes the raw transcript into the format you need, then pastes it into the window that had focus.
meeting-notes.md
VocaPulse
Ctrl+Shift+Space
press to record
Ctrl+Shift+S
restructure
ready
0.0s / 15.4s
Capture
The overlay captures your microphone directly — robust handling for USB, Bluetooth, and built-in audio.
Transcribe
On-device speech recognition on your GPU. Custom dictionary rules apply before the text reaches your clipboard.
Restructure
An optional on-device language model reshapes speech into prose, bullet points, or Markdown — local, offline, on your hardware.
Features
Built for how you actually talk.
VocaPulse is not a browser extension, a wrapper around a cloud API, or a demo. It is a native Linux application designed to sit in your tray, wake on a hotkey, and get out of your way.
Thought-to-Structure
Speak freely. Publish cleanly.
An optional on-device language model rewrites raw dictation into one of three formats: continuous prose, bullet points, or Markdown. Choose which one runs by default, or switch per use with a dedicated hotkey. Nothing about the text — including the decision to restructure it — is visible outside your machine.
three output formats · dedicated hotkey · offline
Three customizable hotkeys
Record, restructure, paste the last transcript.
Each shortcut is remappable in Settings, including bare function keys. Recording works as a toggle or push-to-talk. The overlay never takes keyboard focus, so the paste always lands in the window and cursor position that were active when you started.
Ctrl+Shift+Space · Ctrl+Shift+S · F8
Per-user dictionary
Your vocabulary, recognised on the first try.
Define whole-word replacements for names, technical jargon, brand names, and project codenames. Rules apply before the transcript leaves the recognition engine, so the text arriving in your editor is already correct. Rules live in a plain JSON file you can sync between machines.
case-insensitive · whole-word · portable JSON
Encrypted transcript history
AES-256-GCM, bound to your machine.
An optional local history of every transcription. Each entry is encrypted at rest. The encryption key is derived from a hardware identifier unique to your computer, so copying the database file to another machine produces an unreadable blob. Searchable locally; never uploaded.
local SQLite · AES-256-GCM · HMAC(machine-id)
Runs on any modern GPU
One binary. AMD, NVIDIA, or Intel.
Speech recognition and language-model inference both run on the Vulkan graphics API, supported by every major GPU vendor without proprietary toolkits. On machines without a capable GPU, the application falls back automatically to CPU — still fast enough for conversational use.
Vulkan backend · CPU fallback · single binary
Works in every text field
Browser, editor, terminal, messenger.
A persistent virtual keyboard, initialised once at startup, emits keystrokes into whichever window currently has focus. No per-application integration, no browser extension, no accessibility permissions beyond the ones every Linux application already has. If you can paste, VocaPulse works.
Wayland clipboard · /dev/uinput keystrokes
Who it is for
Built for the people who actually write for a living.
VocaPulse is designed around six common workflows. Not marketing personas — real patterns of daily work we watched and measured.
Writers & managers
Long-form email without the RSI
Dictate a full reply, restructure it into short paragraphs, paste directly into your email client. Faster than typing, cleaner than speaking.
Software engineers
Describe the code, not the grammar
Dictate commit messages, pull-request descriptions, and code comments. Your custom dictionary fixes project-specific terms before the text reaches your editor.
Journalists & students
Interview-to-notes in one pass
Capture what you heard from an interview or lecture directly into your notes application. Convert the raw transcript into bullet points with a single shortcut.
Thinkers out loud
Stream-of-consciousness, tidied
Speak everything you are considering. The on-device language model extracts the points worth keeping. No editor required, no context lost.
Chat-first teams
Messengers at speaking pace
Dictate directly into any composer. Push-to-talk behaves like familiar voice modes — but your words are delivered as text, not audio.
Accessibility-first
Typing-free workflows
Keyboard-driven when you want it, voice-driven when you do not. Suitable for repetitive-strain injury recovery, mobility constraints, or simply preferring to read before writing.
Privacy by architecture
Privacy isn't a setting. It's the design.
Most dictation software records your voice, sends it to a remote server, and asks you to trust a privacy policy. VocaPulse takes a different approach. Every step of the pipeline — microphone capture, speech recognition, restructuring, clipboard delivery — runs locally on your machine. Your audio is not uploaded, not retained on our infrastructure, and not retrievable by us, our staff, or any third party. It cannot be. It never leaves your device.
The sections below describe, in plain language and with specific technical choices, what that commitment actually means.
Audio is never persisted.
Voice samples exist only in volatile memory during recognition and are discarded the moment the transcript is produced. The application has no filesystem code path for writing raw audio, and no configurable option that would create one.
No cloud dependencies during use.
Routine operation — microphone capture, speech recognition, restructuring, and paste — opens zero network connections. The application contacts our servers only to check for updates on a schedule you control, and to download language models when you explicitly initiate a download.
No telemetry. No behavioral analytics.
VocaPulse does not ship crash reporting, usage pings, A/B feature flags, or behavioral tracking of any kind. We learn that something is broken only because you tell us. No session replay, no funnel analytics, no identifiers.
Transcript history, when enabled, is encrypted at rest.
Opting into local history stores each entry in a SQLite database on your machine. Every row is encrypted with AES-256-GCM. The encryption key is derived from a hardware identifier unique to your computer, which means the database file is unreadable if copied elsewhere.
Usage quotas are enforced client-side.
The free tier's daily word limit is counted and enforced by the application itself. Our servers neither receive nor store transcription content or word counts. We cannot audit the limit, because we collect no data that would let us audit it. Honor system, by architecture.
Open source where it matters.
The storage schema, the cryptographic parameters, and the platform integrations are documented in the public product architecture. You do not need to trust us — the security posture is inspectable.
In plain English: we sell you software, not your voice.
There is no business model in the design that could be improved by looking at your data. The software you install today is complete. The privacy posture described here is a technical property of the application, not a promise that could be quietly reversed in a future release without rearchitecting everything.
System requirements
Runs on the hardware you already own.
Linux-first. Wayland-native. Operational on a laptop up to five years old, with graceful CPU fallback for machines without a dedicated GPU.
Operating system
Linux, Wayland
KDE Plasma 5.27+, GNOME 45+, any wlroots compositor
GPU
Any Vulkan-capable
AMD, NVIDIA, Intel — CPU fallback available
VRAM for structuring
4 GB small · 8 GB medium · 16 GB large
dictation alone runs on 2 GB or CPU; structure models scale with VRAM
RAM
4 GB min · 8 GB recommended
Disk
~500 MB
application binary plus the default recognition model
Microphone
Built-in, USB, or Bluetooth
Web Audio API handles A2DP cleanly — no 3-second cutoffs
Keystroke injection
/dev/uinput access
typically in the `input` group on modern distros
Native deb and rpm packages
Signed installers for Debian, Ubuntu, Fedora, and derivatives. First-class integration with your distribution's package manager.
Flatpak in development
Portal-based audio access introduces a set of open issues we intend to fully resolve before publishing a Flatpak build.
AppImage intentionally not supported
The AppImage runtime cannot reliably access the required audio subsystem. We decided against shipping a format that would silently fail for a meaningful subset of users.
Pricing
Free during the beta.
VocaPulse is in active development and every feature is unlocked for early users. A commercial model will be announced — in full, with specific numbers — before any payment is ever requested.
No. Microphone capture, speech recognition, and optional restructuring all run locally on your computer. Voice samples exist only in memory during processing and are discarded immediately after. The only network traffic VocaPulse ever initiates is an on-demand model download and a manifest check for application updates.
Start dictating on your own terms.
Private. Fast. Fully offline. Installation takes a few minutes. You will be typing with your voice inside of five.