Docs
Everything you need to use VocaPulse.
Install, first run, everyday dictation, Thought-to-Structure, settings, and the things that sometimes go wrong. One page, indexed on the left.
Install
One line detects your distribution and installs the matching package. Per-distro manual commands are listed below the one-liner for anyone who prefers to drive the package manager directly.
One-liner (recommended)
curl -sSL https://vocapulse.app/install | bash
x86_64 Linux. The script downloads the latest release from GitHub, detects your distribution, and invokes your package manager. Re-run it any time to upgrade — it no-ops if you are already on the latest version.
Supported families
- Debian — Debian, Ubuntu, Linux Mint, Pop!_OS, elementary OS, Kali, Raspbian
- RPM — Fedora, RHEL, CentOS, Rocky, AlmaLinux, Oracle Linux
- Arch — Arch, CachyOS, EndeavourOS, Manjaro, Garuda, Artix
If the script doesn't recognise your distribution, use the manual commands below.
Manual install
Fedora, RHEL, openSUSE
sudo dnf install ./VocaPulse-<version>-1.x86_64.rpm
Debian, Ubuntu
sudo apt install ./VocaPulse_<version>_amd64.deb
Arch Linux
sudo pacman -U vocapulse-<version>-1-x86_64.pkg.tar.zst
Replace <version> with the filename you downloaded. To check which version you have installed, open the app and go to Settings → About.
Runtime dependencies
Package managers will resolve these for you on a normal desktop install. On a minimal system, install them explicitly.
Arch Linux / CachyOS
sudo pacman -S webkit2gtk-4.1 gtk3 gtk-layer-shell vulkan-icd-loader \
libayatana-appindicator librsvg \
gst-plugins-base gst-plugins-good gst-libav
Debian, Ubuntu
sudo apt install libwebkit2gtk-4.1-0 libgtk-3-0 libgtk-layer-shell0 \
libayatana-appindicator3-1 librsvg2-2 libvulkan1 \
gstreamer1.0-plugins-base gstreamer1.0-plugins-good \
gstreamer1.0-libav
Fedora, RHEL
sudo dnf install webkit2gtk4.1 gtk3 gtk-layer-shell \
libayatana-appindicator-gtk3 librsvg2 \
gstreamer1-plugins-base gstreamer1-plugins-good \
gstreamer1-libav vulkan-loader
Wayland compositor support
Clipboard delivery uses the wlr-data-control protocol.
| Compositor | Minimum version |
|---|---|
| KDE Plasma | 5.27+ |
| GNOME | 45+ |
| wlroots-based — Sway, Hyprland, Wayfire, River, Niri, COSMIC | current |
Hardware requirements
| Component | Requirement |
|---|---|
| OS | Linux on Wayland |
| GPU | Vulkan-capable (AMD, NVIDIA, Intel) — CPU fallback available |
| RAM | 4 GB minimum, more for larger structure tiers |
| Disk | ~500 MB base, plus model files |
AppImage and Flatpak
AppImage is not supported. The bundled WebKitGTK runtime cannot reach PipeWire reliably, which breaks microphone capture. Use the deb, rpm, or Arch package instead. A Flatpak build is on the roadmap.
First run
Launch VocaPulse once to sign in, download the recognition model, and grant microphone access. After that, the app lives in the system tray.
Steps
- Launch VocaPulse from your application menu. The main window opens with a blocking sign-in prompt.
- Sign in. Click Sign in. The app generates a pairing code in the form
XXXX-XXXX, opensvocapulse.app/devicein your browser, and you approve the device against your free account. Recording is gated on a valid sign-in — see Account & devices. - Click Download Model. The default recognition model is about 466 MB and fetches from Hugging Face. This is a one-time download.
- When the download finishes, close the main window. The app minimizes to the system tray and keeps running in the background.
- Press Ctrl+Shift+Space anywhere on your desktop. Your compositor will show a standard microphone permission prompt — grant it once.
- Speak, press the hotkey again to stop, and your words appear in the focused application.
Tray icon
The tray icon is the entry point from here on:
- Left-click, or right-click → Open VocaPulse to open the main window.
- Right-click → Settings to jump directly to the settings pane.
- Right-click → Quit to exit.
Closing the main window does not quit the app. Use Quit from the tray menu.
Dictate
One hotkey starts recording, one hotkey stops. Text lands in whatever window has focus.
The core flow
- Focus the app you want to type into — editor, chat, terminal, browser field.
- Press Ctrl+Shift+Space.
- A small pill appears at the bottom of the screen with voice-reactive bars. Speak.
- Press Ctrl+Shift+Space again to stop.
- The pill turns blue while the recognition model transcribes, then the text is pasted into the focused app.
Pill states
| Color | Meaning |
|---|---|
| Green | Recording |
| Blue | Transcribing |
| Purple | Structuring (F9 flow) |
Paste last
Press F8 to re-paste the most recent transcription. Useful if you tabbed to the wrong window, or need the same text in a second place.
If auto-paste doesn't happen
Auto-paste depends on /dev/uinput access. If the app can't reach it, the transcribed text is placed on the clipboard and the pill flashes once. Paste manually with Ctrl+V. See the Permissions section to enable auto-paste.
Toggle vs push-to-talk
By default, recording is in toggle mode: tap to start, tap to stop.
Push-to-talk — hold the hotkey to record, release to stop — is under Settings → Keyboard → Recording mode.
Remapping the hotkey
Settings → Keyboard has live-capture fields for the record, paste-last, and structure hotkeys. Press the combination you want; the app reads it directly.
Rules:
- Function keys (F1–F12) work bare, with no modifier.
- All other keys require at least one modifier (Ctrl, Alt, Shift, or Super).
Thought-to-Structure
Speak loosely. A local structuring model rewrites the transcript into prose, bullet points, or Markdown before it's pasted.
What it does
With the standard dictation flow, what you say is what you get. Thought-to-Structure adds a second pass: after the recognition model produces a transcript, a local structuring model reshapes it. Rambling becomes prose. A list of half-formed thoughts becomes bullet points. A rough plan becomes Markdown.
Everything stays on your machine.
Using it
- Pick a tier in Settings → Local AI → Structure and download it.
- Press F9 anywhere.
- Speak freely — jump between ideas, leave sentences unfinished, repeat yourself.
- Press F9 again to stop.
- The pill turns blue (transcribing), then purple (structuring), then the cleaned-up text is pasted.
Model tiers
| Tier | Size | Minimum hardware | Best for |
|---|---|---|---|
| Lite | ~1.07 GB | CPU or low-VRAM GPU | basic restructuring, weak hardware |
| Standard (default) | ~1.9 GB | 4 GB VRAM | most users |
| Quality | ~4.47 GB | 6 GB VRAM | best general output |
| XL | ~8.6 GB | 10 GB VRAM, 24 GB RAM | highest quality, specialists |
The app recommends a tier based on the VRAM and RAM it detects. You can override that choice.
Options
All under Settings → Local AI → Structure:
- Structure by default — swap the two hotkeys, so Ctrl+Shift+Space structures and F9 gives you a raw transcript.
- Preload structure model — keeps the structure model warm during recording so the first F9 response is instant. Costs VRAM continuously.
- Unload when idle — frees VRAM after a chosen idle interval: 30 s, 1 m, 2 m, 5 m, 10 m, or Never.
Settings
Every preference lives in one window, auto-saved as you change it.
Opening Settings
Right-click the tray icon → Open VocaPulse, then pick Settings in the sidebar. The left column is a timeline-style sub-navigation; one pane is visible at a time on the right.
Panes
| Pane | What's in it |
|---|---|
| Local AI | Hardware acceleration (Auto / GPU / CPU), recognition model picker, structure model picker, and verified-SHA256 badges on each model row. |
| Keyboard | Record hotkey, paste-last hotkey (F8), structure hotkey (F9), and toggle vs push-to-talk mode. |
| Language | Force one of ten languages — English, German, French, Spanish, Italian, Portuguese, Dutch, Japanese, Chinese, Korean — or leave it on auto-detect. |
| Options | Input device, maximum recording duration, auto-pause media during recording, sound feedback, check-for-updates toggle, and an open-models-folder button. |
| Dictionary | Custom word replacement rules. See the Dictionary section. |
| Support | Send a bug report or feature request. Requires sign-in — see Account & devices. |
| About | Version, release date, Check-for-updates button, and a System info block showing GPU, CPU, VRAM, and RAM. |
Where settings are stored
Settings are written to ~/.local/share/vocapulse/settings.json on your machine. The file is rewritten automatically whenever you change a value, with a 300 ms debounce so rapid edits don't thrash the disk.
Dictionary
A list of find-and-replace rules applied to every transcript after recognition, before paste.
What it's for
Recognition models handle common English, German, and other supported languages well. Where they struggle is consistent:
- Proper names — coworkers, product names, places.
- Domain jargon — acronyms the model has never seen.
- Systematic misspellings — a word that is always transcribed the same wrong way.
The dictionary fixes these before the text is pasted.
Adding a rule
Open Settings → Dictionary and add an entry:
| Field | Example |
|---|---|
| Find | kubectl |
| Replace | kubectl |
Add one row per correction. Rules apply in order and run on every transcript, including structured ones.
Matching
- Whole word — a rule for
aiwill not rewrite the inside ofmaintain. - Case-insensitive —
Kubectl,kubectl, andKUBECTLall match the same rule.
If a correction doesn't fire, check that the text you expect is actually what the recognition model produced — the dictionary operates on the raw transcript, not what you said.
Permissions
Two system-level permissions matter: `/dev/uinput` for auto-paste, and the microphone for recording.
/dev/uinput (auto-paste)
VocaPulse simulates keystrokes through /dev/uinput to paste transcribed text into the focused application. The package ships a udev rule that assigns the device to the input group. You need to be in that group.
Add your user once:
sudo usermod -aG input $USER
Log out and back in — group membership is only refreshed on a new session.
Fallback
If /dev/uinput is unreachable, the app switches to clipboard-only mode automatically. The transcribed text is placed on your clipboard and you paste with Ctrl+V. No error, no prompt — just a quieter paste.
Microphone
The first time you press the record hotkey, your Wayland compositor shows a standard microphone permission prompt. Grant it once and it's remembered. Revoke it the same way you would for a browser — through your compositor's privacy settings.
Wayland compositor
Clipboard and overlay positioning both depend on Wayland protocols your compositor must implement. See the Install section for the supported list. If your compositor isn't listed, the app may start but clipboard delivery will fail silently.
Account & devices
VocaPulse requires a free account to record. Device licensing is tied to your account; nothing about your audio, transcripts, or usage ever leaves the machine — only whether a given device is allowed to run the tier you have.
Why there's an account at all
To enforce FREE and PRO device limits, the app needs to identify which machines belong to you. Audio, transcripts, word counts, and usage timing never leave your device. The server sees a device identifier, a tier, and a last-seen timestamp. That's it.
Sign in
The first launch opens a blocking sign-in prompt. You can also re-sign-in from Settings → Account at any point.
- Click Sign in. The app generates a pairing code in the form
XXXX-XXXX. - In your browser, go to vocapulse.app/device and sign in to your account (a passwordless email code — no credit card, no form filling).
- Enter the pairing code.
- Return to the app. The session is picked up automatically within a few seconds.
Tiers
| Tier | Active devices | Daily recording time |
|---|---|---|
| FREE | 1 | 15 minutes |
| PRO | 3 | Unlimited |
The daily quota is enforced by the app itself, on your machine. The server does not see how much you record. Pricing and final tier terms will be published in detail before any payment is requested.
Offline
The app does not need to reach the server to work. It will run offline for up to 30 days between successful check-ins. From day 20 onward, a banner reminds you to connect so the app can re-verify your tier. After 30 days without a check-in, the app falls back to FREE limits until it reaches the server again.
Revoking a device
Two ways:
- From the app: Settings → Account → Sign out. The device slot frees up immediately.
- From the web: vocapulse.app/account, pick the device, remove it.
Troubleshooting
Common startup and runtime issues, in rough order of frequency.
GStreamer element autoaudiosink not found
The GStreamer "good" plugin set is missing. Install it for your distribution:
# Arch
sudo pacman -S gst-plugins-base gst-plugins-good gst-libav
# Debian / Ubuntu
sudo apt install gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-libav
# Fedora / RHEL
sudo dnf install gstreamer1-plugins-base gstreamer1-plugins-good gstreamer1-libav
App fails to start silently
Usually libayatana-appindicator is missing. Without it the tray library cannot load and the process exits without a visible window.
# Arch
sudo pacman -S libayatana-appindicator
# Debian / Ubuntu
sudo apt install libayatana-appindicator3-1
# Fedora / RHEL
sudo dnf install libayatana-appindicator-gtk3
Auto-paste does nothing
/dev/uinput access is missing. Confirm your user is in the input group:
groups
If input is not in the list, add yourself and log out fully:
sudo usermod -aG input $USER
Until that's resolved, the text is still on your clipboard — paste with Ctrl+V.
Bluetooth microphone cuts out after a few seconds
The app captures audio through the Web Audio pipeline, which negotiates A2DP and HFP profiles correctly in normal use. If the stream cuts, the headset is likely not selected as the input device. Open Settings → Options → Input device and pick the Bluetooth mic explicitly.
Sporadic crash when cycling recordings rapidly
An upstream WebKit and PipeWire race condition. Leave 1–2 seconds between stopping one recording and starting the next. A proper fix depends on an upstream patch.
AppImage doesn't work
Intentional. The bundled WebKitGTK inside an AppImage cannot reliably reach PipeWire, which breaks microphone capture. Use the deb, rpm, or Arch package.
Privacy in practice
VocaPulse is local-first by design. This section describes exactly what that means on disk and on the wire.
What leaves your machine
Only these things, and only when you trigger them:
| Traffic | When | Destination |
|---|---|---|
| Model download | You click Download for a recognition or structure model | Hugging Face |
| Device pairing code | You sign in | vocapulse.app |
| Tier and device check-in | Periodically, while signed in | vocapulse.app |
| Support submission | You send a bug report or feature request | vocapulse.app |
| Update manifest check | On startup, if enabled | vocapulse.app |
Audio, transcripts, dictionary entries, usage timing, and word counts never leave your machine. There is no background telemetry channel to disable, because there is none.
Where local data lives
Everything the app stores sits under ~/.local/share/vocapulse/:
| Path | Contents |
|---|---|
settings.json | Your preferences |
models/ | Downloaded recognition and structure models |
dictionary.json | Word replacement rules |
history.db | Encrypted transcript history (if enabled) |
Transcript history
Off by default. When enabled, each entry is written to a local SQLite database and encrypted row-by-row with AES-256-GCM. The encryption key is derived from a hardware identifier unique to your machine, so copying history.db elsewhere produces an unreadable file.
Export and delete individual entries — or clear the whole history — from Settings → History.
On account deletion
Deleting your account on vocapulse.app hard-deletes your user row and cascades all associated devices on the server side. Local data on your machine is not touched by that action. It's yours; remove it when you want:
rm -rf ~/.local/share/vocapulse/