šļø OpenSource Voice Dictation Agent (Wispr Flow clone)
RepositoryFree<sub>ā external</sub>
Capabilities12 decomposed
push-to-talk voice dictation with native keyboard interception
Medium confidenceCaptures audio input via Fn key (hold-to-record) or double-tap (hands-free toggle) using a native C++ module (fn_key_monitor.node) that hooks into macOS keyboard events at the system level, bypassing Electron's renderer process limitations. The native module runs in the main process and communicates via IPC to trigger audio recording without application focus requirements, enabling dictation in any macOS application.
Uses native C++ module (fn_key_monitor.node) compiled with node-gyp to hook macOS keyboard events at the system level, enabling global Fn key capture that works across all applications without requiring app focus ā unlike Electron's built-in globalShortcut which only works when app is active. Implements dual-mode interaction: single hold-to-record and double-tap hands-free toggle, both handled in native code before IPC marshaling.
More reliable than Whisper Flow's browser-based approach because it operates at the OS kernel level via native modules rather than relying on browser APIs, and supports global hotkeys without requiring the Electron window to be focused.
dual-path transcription with local whisper or cloud deepgram
Medium confidenceImplements a pluggable transcription architecture that routes audio to either local Whisper models (tiny/base/small via whisper-node-addon) for offline processing or cloud Deepgram API for high-speed transcription. The system abstracts transcription provider selection through a configuration layer, allowing users to toggle between privacy-first local processing and speed-optimized cloud processing without code changes. Audio is buffered in the renderer process and sent to the main process via IPC, which routes to the selected provider.
Implements a dual-path architecture with runtime provider selection rather than compile-time choice ā users can toggle between local Whisper and Deepgram via settings without rebuilding. Uses whisper-node-addon (native C++ binding to OpenAI Whisper) for local processing and Deepgram REST API for cloud path, with unified IPC interface in main process that abstracts provider differences. Configuration persisted in electron-store allows seamless switching.
More flexible than Whisper Flow (cloud-only) or Talon Voice (local-only) because it offers both paths with runtime selection, and more privacy-preserving than commercial dictation tools (Dragon, Otter) by supporting fully offline local transcription as default.
zero-telemetry privacy model with no analytics collection
Medium confidenceImplements a privacy-first architecture with zero telemetry ā no analytics libraries, no tracking pixels, no data collection beyond what's necessary for core functionality. The app does not send usage data, crash reports, or user behavior analytics to any external service. All processing (transcription, LLM post-processing) can be done locally without cloud connectivity, and cloud processing (Deepgram, LLM APIs) only sends audio/text when explicitly configured by the user.
Explicitly excludes all analytics and telemetry libraries from package.json and implements no tracking code ā privacy is enforced by architecture rather than configuration. Supports fully offline processing (local Whisper + Ollama) as the default path, with cloud processing as an optional user-selected feature. No crash reporting, no error tracking, no usage analytics ā complete transparency about data flow.
More privacy-preserving than commercial tools (Otter, Fireflies, Whisper Flow) which collect usage analytics and store transcripts on their servers. More transparent than tools claiming privacy but using third-party SDKs for crash reporting or analytics.
ios beta support with testflight distribution
Medium confidenceExtends Jarvis to iOS via a beta version distributed through Apple TestFlight, enabling voice dictation on iPhone and iPad. The iOS implementation (ios/README.md) uses native iOS APIs for audio capture and keyboard integration, with the same dual-path architecture (local Whisper or cloud Deepgram) as the macOS version. TestFlight allows beta testing with up to 10,000 external testers before App Store release.
Extends the macOS dual-path architecture to iOS using native Swift/Objective-C APIs for audio capture and keyboard integration. Uses TestFlight for beta distribution, allowing community feedback before App Store release. Maintains feature parity with macOS version (local Whisper + Ollama, cloud Deepgram + LLM APIs) while adapting UI and interaction patterns for iOS.
More privacy-preserving than commercial iOS dictation apps (Otter, Fireflies) because it supports local-only processing. More feature-complete than iOS's built-in dictation because it adds grammar correction and filler removal via LLM post-processing.
ai-powered post-processing with filler removal and grammar correction
Medium confidenceChains transcribed text through an LLM-based post-processing pipeline that removes filler words ('um', 'like', 'uh'), corrects grammar, adds punctuation, and enhances readability. The system supports dual-path LLM routing: local Ollama server (models: sam860/LFM2:1.2b, llama3, mistral) for offline processing or cloud LLMs (Gemini, Claude, OpenAI) for higher quality. Post-processing is triggered automatically after transcription completes, with results cached to avoid re-processing identical transcripts.
Implements a dual-path LLM chain with provider abstraction ā routes transcribed text to either local Ollama server or cloud LLM APIs (Gemini/Claude/OpenAI) via a unified interface. Uses prompt engineering to instruct LLM to remove fillers, fix grammar, and add punctuation in a single pass. Caches results keyed by transcript hash to avoid re-processing identical inputs, reducing latency and API costs on repeated dictation.
More comprehensive than Whisper Flow's basic punctuation (which only adds periods) because it combines filler removal, grammar correction, and punctuation in an LLM-driven pipeline. More privacy-preserving than commercial tools (Otter, Fireflies) by supporting fully local Ollama processing, and more cost-effective than cloud-only solutions by offering local fallback.
ipc-based main-renderer process communication with security sandboxing
Medium confidenceImplements a secure inter-process communication (IPC) bridge between Electron's main process (native module access, file I/O, API calls) and renderer process (UI, user interactions) using ipcMain and ipcRenderer with preload script isolation. The preload script (src/preload.ts) exposes a whitelist of safe IPC channels (e.g., 'start-recording', 'transcribe-audio', 'update-settings') to the renderer, preventing direct access to Node.js APIs and enforcing context isolation. Audio buffers and settings are marshaled through IPC as serialized JSON or binary data.
Uses Electron's preload script (src/preload.ts) with context isolation enabled to expose a whitelist of safe IPC channels to the renderer, preventing direct Node.js API access while maintaining full main process capabilities. Implements channel-based message routing in main.ts that dispatches IPC calls to appropriate handlers (native modules, API clients, file I/O), with error handling and response marshaling. Audio buffers are passed as binary data through IPC using Electron's native serialization.
More secure than older Electron patterns (nodeIntegration: true) because it enforces process isolation and API whitelisting, preventing renderer process compromise from accessing file system or native modules. More maintainable than custom socket-based IPC because it uses Electron's built-in IPC with automatic serialization.
settings persistence with electron-store and onboarding flow
Medium confidencePersists user configuration (transcription provider, LLM choice, API keys, keyboard shortcuts) to disk using electron-store, a lightweight JSON-based key-value store that encrypts sensitive data (API keys) at rest. The onboarding interface (Onboarding Interface component) guides first-time users through provider selection (local vs cloud), API key configuration, and keyboard shortcut customization. Settings are loaded on app startup and cached in memory; changes trigger IPC updates to all processes and persist immediately to disk.
Uses electron-store for lightweight JSON-based persistence with optional encryption for sensitive data (API keys), avoiding the complexity of SQLite or external databases. Onboarding flow (Onboarding Interface component) is built as a separate Electron window that guides users through provider selection and API key configuration before main app launch. Settings changes trigger IPC broadcasts to all processes, ensuring UI and main process stay in sync without manual refresh.
Simpler than Whisper Flow's cloud-based settings sync because it uses local-only persistence, and more user-friendly than manual config file editing because it provides a guided onboarding UI. Supports both local and cloud provider configuration in a single settings schema, unlike single-path tools.
native audio capture with system microphone integration
Medium confidenceCaptures audio from the system microphone using Web Audio API (in renderer process) or native audio APIs (via native modules in main process), with automatic gain control and noise suppression. Audio is buffered in memory as PCM samples at 16kHz sample rate, then sent to the transcription pipeline via IPC. The system handles microphone permission requests (macOS Privacy & Security) and gracefully degrades if microphone is unavailable or denied.
Uses Web Audio API in renderer process for cross-platform compatibility but can fall back to native audio modules in main process for lower latency and better control. Buffers audio at 16kHz (standard for speech recognition) and implements basic automatic gain control to normalize microphone input levels. Handles macOS microphone permission requests gracefully with user-friendly error messages.
More integrated than browser-based Whisper Flow because it captures audio at the system level via Electron, avoiding browser tab audio limitations. More flexible than command-line tools (ffmpeg) because it provides real-time audio buffering and automatic format conversion.
multi-architecture native module compilation for apple silicon and intel
Medium confidenceBuilds and distributes native C++ modules (fn_key_monitor.node, whisper-node-addon) as separate binaries for Apple Silicon (ARM64) and Intel (x64) architectures using node-gyp and electron-builder. The build process compiles native code with platform-specific optimizations, signs binaries with Apple developer certificate, and packages both architectures into a universal macOS app. At runtime, the app loads the correct binary based on process.arch, ensuring optimal performance on each architecture.
Uses electron-builder with custom build scripts to compile native modules separately for Apple Silicon and Intel, then packages both binaries into a universal macOS app. Implements runtime architecture detection (process.arch) to load the correct binary without user intervention. Integrates Apple notarization into the build pipeline, eliminating security warnings on first launch.
More user-friendly than requiring users to compile native modules locally (like some open-source projects) because binaries are pre-built and notarized. More maintainable than maintaining separate app versions for each architecture because a single universal app bundle contains both binaries.
hands-free toggle mode with double-tap gesture detection
Medium confidenceImplements a dual-mode interaction model where users can either hold Fn to record (single-press mode) or double-tap Fn to toggle hands-free recording on/off (toggle mode). The native keyboard module (fn_key_monitor.node) detects double-tap timing (two presses within 300ms) and switches recording state accordingly. In hands-free mode, the app continues recording until the user double-taps again, enabling dictation without holding a key.
Implements double-tap detection in native C++ (fn_key_monitor.node) by tracking press/release timing, avoiding the latency of IPC-based gesture detection. Maintains hands-free state in the main process and broadcasts state changes to renderer via IPC, enabling UI indicators and preventing accidental re-triggering. Double-tap threshold (300ms) is configurable via settings.
More intuitive than Whisper Flow's single hold-to-record model because it supports hands-free toggle for long dictation sessions. More reliable than software-based gesture detection because it operates at the native keyboard event level with sub-millisecond timing precision.
settings ui with provider selection and api key configuration
Medium confidenceProvides a React-based settings interface (Settings Interface component) where users can select transcription provider (local Whisper vs Deepgram), LLM provider (local Ollama vs cloud APIs), and configure API keys securely. The settings UI is rendered in a separate Electron window and communicates with the main process via IPC to read/write settings to electron-store. Sensitive fields (API keys) are masked in the UI and encrypted at rest in the settings file.
Implements settings as a separate Electron window (not a modal in main window) with dedicated React component tree, avoiding UI blocking. Uses IPC to communicate settings changes to main process, which broadcasts updates to all processes via electron-store. API keys are masked in UI and encrypted at rest using electron-store's encryption feature.
More user-friendly than command-line configuration (like Talon Voice) because it provides a graphical interface. More secure than storing API keys in plaintext config files because it uses electron-store's built-in encryption.
macos app signing and notarization for distribution
Medium confidenceAutomates code signing and Apple notarization of the macOS app bundle using electron-builder and custom build scripts. The build process signs the app with a valid Apple developer certificate, submits it to Apple's notarization service, and waits for approval before packaging the DMG installer. Notarized apps display no security warnings when users download and launch them, improving trust and reducing support burden.
Integrates Apple notarization into electron-builder's build pipeline using custom scripts that submit the app to Apple's notarization service and poll for approval status. Stores Apple ID credentials securely in environment variables and uses app-specific passwords (not main Apple ID password) for added security. Notarization ticket is stapled to the app bundle, allowing offline verification.
More user-friendly than distributing unsigned apps because users don't see security warnings. More secure than self-signing because Apple's notarization service scans for malware and verifies developer identity.
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with šļø OpenSource Voice Dictation Agent (Wispr Flow clone), ranked by overlap. Discovered automatically through the match graph.
Teleprompter
An on-device AI for your meetings that listens to you and makes charismatic quote suggestions.
Ermine
Local, secure, and efficient audio transcription on your...
Wave
Transform audio into text and summaries effortlessly on iOS;...
Dictation IO
Transform speech into text instantly, enhancing productivity across...
Screenpipe
An open-source tool for recording screen and audio activity with AI-powered search, automations, and support for local LLMs. #opensource
Cleft
Transforms voice to structured markdown notes, ensuring privacy and...
Best For
- āmacOS desktop users who spend time in multiple applications (email, documents, code editors)
- āaccessibility-focused users who need hands-free text input
- ādevelopers building privacy-first voice tools on Electron
- āprivacy-conscious users who cannot send audio to cloud services
- āteams with strict data residency requirements (HIPAA, GDPR)
- ādevelopers building voice tools with configurable transcription backends
- āusers in regions with poor internet connectivity who need offline fallback
- āprivacy-conscious users who avoid commercial voice tools (Otter, Fireflies) due to data collection concerns
Known Limitations
- ā macOS-only implementation ā no Windows or Linux support due to platform-specific keyboard event APIs
- ā Requires accessibility permissions (macOS Security & Privacy settings) which users must manually grant
- ā Native module must be recompiled for both Apple Silicon (M1/M2/M3/M4) and Intel architectures
- ā Fn key binding conflicts with system shortcuts on some Mac models or keyboard layouts
- ā Local Whisper models (tiny/base/small) have lower accuracy than cloud providers ā WER typically 10-15% higher than Deepgram
- ā Local transcription adds 2-5 second latency on M1/M2 Macs due to model inference time; larger models (medium/large) require 8GB+ RAM and are not bundled
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
About
<sub>ā external</sub>
Categories
Alternatives to šļø OpenSource Voice Dictation Agent (Wispr Flow clone)
Are you the builder of šļø OpenSource Voice Dictation Agent (Wispr Flow clone)?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search ā