Docs

Everything you need to use VocaPulse.

Install, first run, everyday dictation, Thought-to-Structure, settings, and the things that sometimes go wrong. One page, indexed on the left.

Install

One line detects your distribution and installs the matching package. Per-distro manual commands are listed below the one-liner for anyone who prefers to drive the package manager directly.

One-liner (recommended)

curl -sSL https://vocapulse.app/install | bash

x86_64 Linux. The script downloads the latest release from GitHub, detects your distribution, and invokes your package manager. Re-run it any time to upgrade — it no-ops if you are already on the latest version.

Supported families

Debian — Debian, Ubuntu, Linux Mint, Pop!_OS, elementary OS, Kali, Raspbian
RPM — Fedora, RHEL, CentOS, Rocky, AlmaLinux, Oracle Linux
Arch — Arch, CachyOS, EndeavourOS, Manjaro, Garuda, Artix

If the script doesn't recognise your distribution, use the manual commands below.

Manual install

Fedora, RHEL, openSUSE

sudo dnf install ./VocaPulse-<version>-1.x86_64.rpm

Debian, Ubuntu

sudo apt install ./VocaPulse_<version>_amd64.deb

Arch Linux

sudo pacman -U vocapulse-<version>-1-x86_64.pkg.tar.zst

Replace <version> with the filename you downloaded. To check which version you have installed, open the app and go to Settings → About.

Runtime dependencies

Package managers will resolve these for you on a normal desktop install. On a minimal system, install them explicitly.

Arch Linux / CachyOS

sudo pacman -S webkit2gtk-4.1 gtk3 gtk-layer-shell vulkan-icd-loader \
               libayatana-appindicator librsvg \
               gst-plugins-base gst-plugins-good gst-libav

Debian, Ubuntu

sudo apt install libwebkit2gtk-4.1-0 libgtk-3-0 libgtk-layer-shell0 \
                 libayatana-appindicator3-1 librsvg2-2 libvulkan1 \
                 gstreamer1.0-plugins-base gstreamer1.0-plugins-good \
                 gstreamer1.0-libav

Fedora, RHEL

sudo dnf install webkit2gtk4.1 gtk3 gtk-layer-shell \
                 libayatana-appindicator-gtk3 librsvg2 \
                 gstreamer1-plugins-base gstreamer1-plugins-good \
                 gstreamer1-libav vulkan-loader

Wayland compositor support

Clipboard delivery uses the wlr-data-control protocol.

Compositor	Minimum version
KDE Plasma	5.27+
GNOME	45+
wlroots-based — Sway, Hyprland, Wayfire, River, Niri, COSMIC	current

Hardware requirements

Component	Requirement
OS	Linux on Wayland
GPU	Vulkan-capable (AMD, NVIDIA, Intel) — CPU fallback available
RAM	4 GB minimum, more for larger structure tiers
Disk	~500 MB base, plus model files

AppImage and Flatpak

AppImage is not supported. The bundled WebKitGTK runtime cannot reach PipeWire reliably, which breaks microphone capture. Use the deb, rpm, or Arch package instead. A Flatpak build is on the roadmap.

First run

Launch VocaPulse once to sign in, download the recognition model, and grant microphone access. After that, the app lives in the system tray.

Steps

Launch VocaPulse from your application menu. The main window opens with a blocking sign-in prompt.
Sign in. Click Sign in. The app generates a pairing code in the form XXXX-XXXX, opens vocapulse.app/device in your browser, and you approve the device against your free account. Recording is gated on a valid sign-in — see Account & devices.
Click Download Model. The default recognition model is about 466 MB and fetches from Hugging Face. This is a one-time download.
When the download finishes, close the main window. The app minimizes to the system tray and keeps running in the background.
Press Ctrl+Shift+Space anywhere on your desktop. Your compositor will show a standard microphone permission prompt — grant it once.
Speak, press the hotkey again to stop, and your words appear in the focused application.

Tray icon

The tray icon is the entry point from here on:

Left-click, or right-click → Open VocaPulse to open the main window.
Right-click → Settings to jump directly to the settings pane.
Right-click → Quit to exit.

Closing the main window does not quit the app. Use Quit from the tray menu.

Dictate

One hotkey starts recording, one hotkey stops. Text lands in whatever window has focus.

The core flow

Focus the app you want to type into — editor, chat, terminal, browser field.
Press Ctrl+Shift+Space.
A small pill appears at the bottom of the screen with voice-reactive bars. Speak.
Press Ctrl+Shift+Space again to stop.
The pill turns blue while the recognition model transcribes, then the text is pasted into the focused app.

Pill states

Color	Meaning
Green	Recording
Blue	Transcribing
Purple	Structuring (F9 flow)

Paste last

Press F8 to re-paste the most recent transcription. Useful if you tabbed to the wrong window, or need the same text in a second place.

If auto-paste doesn't happen

Auto-paste depends on /dev/uinput access. If the app can't reach it, the transcribed text is placed on the clipboard and the pill flashes once. Paste manually with Ctrl+V. See the Permissions section to enable auto-paste.

Toggle vs push-to-talk

By default, recording is in toggle mode: tap to start, tap to stop.

Push-to-talk — hold the hotkey to record, release to stop — is under Settings → Keyboard → Recording mode.

Remapping the hotkey

Settings → Keyboard has live-capture fields for the record, paste-last, and structure hotkeys. Press the combination you want; the app reads it directly.

Rules:

Function keys (F1–F12) work bare, with no modifier.
All other keys require at least one modifier (Ctrl, Alt, Shift, or Super).

Thought-to-Structure

Speak loosely. A local structuring model rewrites the transcript into prose, bullet points, or Markdown before it's pasted.

What it does

With the standard dictation flow, what you say is what you get. Thought-to-Structure adds a second pass: after the recognition model produces a transcript, a local structuring model reshapes it. Rambling becomes prose. A list of half-formed thoughts becomes bullet points. A rough plan becomes Markdown.

Everything stays on your machine.

Using it

Pick a tier in Settings → Local AI → Structure and download it.
Press F9 anywhere.
Speak freely — jump between ideas, leave sentences unfinished, repeat yourself.
Press F9 again to stop.
The pill turns blue (transcribing), then purple (structuring), then the cleaned-up text is pasted.

Model tiers

Tier	Size	Minimum hardware	Best for
Lite	~1.07 GB	CPU or low-VRAM GPU	basic restructuring, weak hardware
Standard (default)	~1.9 GB	4 GB VRAM	most users
Quality	~4.47 GB	6 GB VRAM	best general output
XL	~8.6 GB	10 GB VRAM, 24 GB RAM	highest quality, specialists

The app recommends a tier based on the VRAM and RAM it detects. You can override that choice.

Options

All under Settings → Local AI → Structure:

Structure by default — swap the two hotkeys, so Ctrl+Shift+Space structures and F9 gives you a raw transcript.
Preload structure model — keeps the structure model warm during recording so the first F9 response is instant. Costs VRAM continuously.
Unload when idle — frees VRAM after a chosen idle interval: 30 s, 1 m, 2 m, 5 m, 10 m, or Never.

Settings

Every preference lives in one window, auto-saved as you change it.

Opening Settings

Right-click the tray icon → Open VocaPulse, then pick Settings in the sidebar. The left column is a timeline-style sub-navigation; one pane is visible at a time on the right.

Panes

Pane	What's in it
Local AI	Hardware acceleration (Auto / GPU / CPU), recognition model picker, structure model picker, and verified-SHA256 badges on each model row.
Keyboard	Record hotkey, paste-last hotkey (F8), structure hotkey (F9), and toggle vs push-to-talk mode.
Language	Force one of ten languages — English, German, French, Spanish, Italian, Portuguese, Dutch, Japanese, Chinese, Korean — or leave it on auto-detect.
Options	Input device, maximum recording duration, auto-pause media during recording, sound feedback, check-for-updates toggle, and an open-models-folder button.
Dictionary	Custom word replacement rules. See the Dictionary section.
Support	Send a bug report or feature request. Requires sign-in — see Account & devices.
About	Version, release date, Check-for-updates button, and a System info block showing GPU, CPU, VRAM, and RAM.

Where settings are stored

Settings are written to ~/.local/share/vocapulse/settings.json on your machine. The file is rewritten automatically whenever you change a value, with a 300 ms debounce so rapid edits don't thrash the disk.

Dictionary

A list of find-and-replace rules applied to every transcript after recognition, before paste.

What it's for

Recognition models handle common English, German, and other supported languages well. Where they struggle is consistent:

Proper names — coworkers, product names, places.
Domain jargon — acronyms the model has never seen.
Systematic misspellings — a word that is always transcribed the same wrong way.

The dictionary fixes these before the text is pasted.

Adding a rule

Open Settings → Dictionary and add an entry:

Field	Example
Find	`kubectl`
Replace	`kubectl`

Add one row per correction. Rules apply in order and run on every transcript, including structured ones.

Matching

Whole word — a rule for ai will not rewrite the inside of maintain.
Case-insensitive — Kubectl, kubectl, and KUBECTL all match the same rule.

If a correction doesn't fire, check that the text you expect is actually what the recognition model produced — the dictionary operates on the raw transcript, not what you said.

Permissions

Two system-level permissions matter: `/dev/uinput` for auto-paste, and the microphone for recording.

`/dev/uinput` (auto-paste)

VocaPulse simulates keystrokes through /dev/uinput to paste transcribed text into the focused application. The package ships a udev rule that assigns the device to the input group. You need to be in that group.

Add your user once:

sudo usermod -aG input $USER

Log out and back in — group membership is only refreshed on a new session.

Fallback

If /dev/uinput is unreachable, the app switches to clipboard-only mode automatically. The transcribed text is placed on your clipboard and you paste with Ctrl+V. No error, no prompt — just a quieter paste.

Microphone

The first time you press the record hotkey, your Wayland compositor shows a standard microphone permission prompt. Grant it once and it's remembered. Revoke it the same way you would for a browser — through your compositor's privacy settings.

Wayland compositor

Clipboard and overlay positioning both depend on Wayland protocols your compositor must implement. See the Install section for the supported list. If your compositor isn't listed, the app may start but clipboard delivery will fail silently.

Account & devices

VocaPulse requires a free account to record. Device licensing is tied to your account; nothing about your audio, transcripts, or usage ever leaves the machine — only whether a given device is allowed to run the tier you have.

Why there's an account at all

To enforce FREE and PRO device limits, the app needs to identify which machines belong to you. Audio, transcripts, word counts, and usage timing never leave your device. The server sees a device identifier, a tier, and a last-seen timestamp. That's it.

Sign in

The first launch opens a blocking sign-in prompt. You can also re-sign-in from Settings → Account at any point.

Click Sign in. The app generates a pairing code in the form XXXX-XXXX.
In your browser, go to vocapulse.app/device and sign in to your account (a passwordless email code — no credit card, no form filling).
Enter the pairing code.
Return to the app. The session is picked up automatically within a few seconds.

Tiers

Tier	Active devices	Daily recording time
FREE	1	15 minutes
PRO	3	Unlimited

The daily quota is enforced by the app itself, on your machine. The server does not see how much you record. Pricing and final tier terms will be published in detail before any payment is requested.

Offline

The app does not need to reach the server to work. It will run offline for up to 30 days between successful check-ins. From day 20 onward, a banner reminds you to connect so the app can re-verify your tier. After 30 days without a check-in, the app falls back to FREE limits until it reaches the server again.

Revoking a device

Two ways:

From the app: Settings → Account → Sign out. The device slot frees up immediately.
From the web: vocapulse.app/account, pick the device, remove it.

Troubleshooting

Common startup and runtime issues, in rough order of frequency.

GStreamer element `autoaudiosink` not found

The GStreamer "good" plugin set is missing. Install it for your distribution:

# Arch
sudo pacman -S gst-plugins-base gst-plugins-good gst-libav

# Debian / Ubuntu
sudo apt install gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-libav

# Fedora / RHEL
sudo dnf install gstreamer1-plugins-base gstreamer1-plugins-good gstreamer1-libav

App fails to start silently

Usually libayatana-appindicator is missing. Without it the tray library cannot load and the process exits without a visible window.

# Arch
sudo pacman -S libayatana-appindicator

# Debian / Ubuntu
sudo apt install libayatana-appindicator3-1

# Fedora / RHEL
sudo dnf install libayatana-appindicator-gtk3

Auto-paste does nothing

/dev/uinput access is missing. Confirm your user is in the input group:

groups

If input is not in the list, add yourself and log out fully:

sudo usermod -aG input $USER

Until that's resolved, the text is still on your clipboard — paste with Ctrl+V.

Bluetooth microphone cuts out after a few seconds

The app captures audio through the Web Audio pipeline, which negotiates A2DP and HFP profiles correctly in normal use. If the stream cuts, the headset is likely not selected as the input device. Open Settings → Options → Input device and pick the Bluetooth mic explicitly.

Sporadic crash when cycling recordings rapidly

An upstream WebKit and PipeWire race condition. Leave 1–2 seconds between stopping one recording and starting the next. A proper fix depends on an upstream patch.

AppImage doesn't work

Intentional. The bundled WebKitGTK inside an AppImage cannot reliably reach PipeWire, which breaks microphone capture. Use the deb, rpm, or Arch package.

Privacy in practice

VocaPulse is local-first by design. This section describes exactly what that means on disk and on the wire.

What leaves your machine

Only these things, and only when you trigger them:

Traffic	When	Destination
Model download	You click Download for a recognition or structure model	Hugging Face
Device pairing code	You sign in	vocapulse.app
Tier and device check-in	Periodically, while signed in	vocapulse.app
Support submission	You send a bug report or feature request	vocapulse.app
Update manifest check	On startup, if enabled	vocapulse.app

Audio, transcripts, dictionary entries, usage timing, and word counts never leave your machine. There is no background telemetry channel to disable, because there is none.

Where local data lives

Everything the app stores sits under ~/.local/share/vocapulse/:

Path	Contents
`settings.json`	Your preferences
`models/`	Downloaded recognition and structure models
`dictionary.json`	Word replacement rules
`history.db`	Encrypted transcript history (if enabled)

Transcript history

Off by default. When enabled, each entry is written to a local SQLite database and encrypted row-by-row with AES-256-GCM. The encryption key is derived from a hardware identifier unique to your machine, so copying history.db elsewhere produces an unreadable file.

Export and delete individual entries — or clear the whole history — from Settings → History.

On account deletion

Deleting your account on vocapulse.app hard-deletes your user row and cascades all associated devices on the server side. Local data on your machine is not touched by that action. It's yours; remove it when you want:

rm -rf ~/.local/share/vocapulse/

Something missing or out of date? Let us know.

Install

One-liner (recommended)

Manual install

Fedora, RHEL, openSUSE

Debian, Ubuntu

Arch Linux

Runtime dependencies

Arch Linux / CachyOS

Debian, Ubuntu

Fedora, RHEL

Wayland compositor support

Hardware requirements

AppImage and Flatpak

First run

Steps

Tray icon

Dictate

The core flow

Pill states

Paste last

If auto-paste doesn't happen

Toggle vs push-to-talk

Remapping the hotkey

Thought-to-Structure

What it does

Using it

Model tiers

Options

Settings

Opening Settings

Panes

Where settings are stored

Dictionary

What it's for

Adding a rule

Matching

Permissions

/dev/uinput (auto-paste)

Fallback

Microphone

Wayland compositor

Account & devices

Why there's an account at all

Sign in

Tiers

Offline

Revoking a device

Troubleshooting

GStreamer element autoaudiosink not found

App fails to start silently

Auto-paste does nothing

Bluetooth microphone cuts out after a few seconds

Sporadic crash when cycling recordings rapidly

AppImage doesn't work

Privacy in practice

What leaves your machine

Where local data lives

Transcript history

On account deletion

`/dev/uinput` (auto-paste)

GStreamer element `autoaudiosink` not found