Press a key, speak, and your words land in whatever text field has focus.
A local-first dictation app for Windows. Transcription runs on your own machine -- no audio ever leaves it. A detached fork of Handy with its own defaults and a frog at the wheel.
Unsigned build -- Windows SmartScreen warns on first launch; choose More info, then Run anyway.
Three steps, all on your machine.
Hold Ctrl+Alt+Space (the Windows default) to start and stop recording. The chord is yours to change.
Silero voice-activity detection trims the silence so the engine only hears your words.
AudioBud transcribes locally and drops the text straight into the focused field.
Dictation runs entirely on your machine. The one exception is optional LLM cleanup, which sends text to a provider only when you turn it on. The list reflects what is in the app today.
Audio never leaves your machine. There is no cloud step and no account.
Parakeet (ONNX/DirectML) and Whisper (whisper.cpp/Vulkan), plus Moonshine, SenseVoice, GigaAM, Canary, and Cohere. Each downloads from inside the app.
Parakeet V3 handles 25 European languages; Whisper adds many more and can translate to English. The interface ships in more than a dozen languages.
A configurable global hotkey, a hold-to-talk or tap-to-toggle mode, and an optional auto-submit key.
Send a transcript through a provider of your choice with your own prompt. Off by default; your API key stays local.
Teach it the names and jargon it would otherwise miss, with a simple .txt import.
Keep recent transcriptions and recordings, or set them to expire on your schedule.
Pick the Whisper and ONNX accelerators -- auto, CPU, CUDA, DirectML, or ROCm -- and the GPU device.
Live in the system tray, start on boot, and drive a running instance from the command line.
Real panels from the running app. Shortcuts shown are customized; the Windows default is Ctrl+Alt+Space.
AudioBud is an early prototype. Here is the honest state of things.