What can 🎙️ OpenSource Voice Dictation Agent (Wispr Flow clone) do?

push-to-talk voice dictation with native keyboard interception, dual-path transcription with local whisper or cloud deepgram, zero-telemetry privacy model with no analytics collection, ios beta support with testflight distribution, ai-powered post-processing with filler removal and grammar correction, ipc-based main-renderer process communication with security sandboxing, settings persistence with electron-store and onboarding flow, native audio capture with system microphone integration, multi-architecture native module compilation for apple silicon and intel, hands-free toggle mode with double-tap gesture detection, settings ui with provider selection and api key configuration, macos app signing and notarization for distribution

🎙️ OpenSource Voice Dictation Agent (Wispr Flow clone)

RepositoryFree

<sub>↗ external</sub>

Open Source

/ 100

12 capabilities

Capabilities12 decomposed

push-to-talk voice dictation with native keyboard interception

Medium confidence

Captures audio input via Fn key (hold-to-record) or double-tap (hands-free toggle) using a native C++ module (fn_key_monitor.node) that hooks into macOS keyboard events at the system level, bypassing Electron's renderer process limitations. The native module runs in the main process and communicates via IPC to trigger audio recording without application focus requirements, enabling dictation in any macOS application.

Solves for

I want to dictate text into any application without switching focus or using mouseI need a keyboard shortcut that works globally across all macOS apps, even when my app isn't activeI want to toggle hands-free recording mode with a double-tap gesture

Best for

macOS desktop users who spend time in multiple applications (email, documents, code editors)

accessibility-focused users who need hands-free text input

developers building privacy-first voice tools on Electron

Requires

macOS 10.13 or later

Accessibility permissions granted to application in System Preferences

Node.js 18+ for native module compilation

Limitations

macOS-only implementation — no Windows or Linux support due to platform-specific keyboard event APIs

Requires accessibility permissions (macOS Security & Privacy settings) which users must manually grant

Native module must be recompiled for both Apple Silicon (M1/M2/M3/M4) and Intel architectures

What makes it unique

Uses native C++ module (fn_key_monitor.node) compiled with node-gyp to hook macOS keyboard events at the system level, enabling global Fn key capture that works across all applications without requiring app focus — unlike Electron's built-in globalShortcut which only works when app is active. Implements dual-mode interaction: single hold-to-record and double-tap hands-free toggle, both handled in native code before IPC marshaling.

vs alternatives

More reliable than Whisper Flow's browser-based approach because it operates at the OS kernel level via native modules rather than relying on browser APIs, and supports global hotkeys without requiring the Electron window to be focused.

dual-path transcription with local whisper or cloud deepgram

Medium confidence

Implements a pluggable transcription architecture that routes audio to either local Whisper models (tiny/base/small via whisper-node-addon) for offline processing or cloud Deepgram API for high-speed transcription. The system abstracts transcription provider selection through a configuration layer, allowing users to toggle between privacy-first local processing and speed-optimized cloud processing without code changes. Audio is buffered in the renderer process and sent to the main process via IPC, which routes to the selected provider.

Solves for

I want to transcribe speech to text without sending audio to the cloud for privacyI need fast, accurate transcription with support for multiple languages and accentsI want to switch between local and cloud transcription based on network availability or accuracy needs

Best for

privacy-conscious users who cannot send audio to cloud services

teams with strict data residency requirements (HIPAA, GDPR)

developers building voice tools with configurable transcription backends

Requires

For local path: Node.js 18+, whisper-node-addon dependency, 2GB+ free disk space for model files

For cloud path: Deepgram API key, active internet connection

macOS 10.13+ with compatible CPU (Apple Silicon or Intel x64)

Limitations

Local Whisper models (tiny/base/small) have lower accuracy than cloud providers — WER typically 10-15% higher than Deepgram

Local transcription adds 2-5 second latency on M1/M2 Macs due to model inference time; larger models (medium/large) require 8GB+ RAM and are not bundled

Deepgram API requires valid API key and internet connectivity; no offline fallback if network fails mid-session

What makes it unique

Implements a dual-path architecture with runtime provider selection rather than compile-time choice — users can toggle between local Whisper and Deepgram via settings without rebuilding. Uses whisper-node-addon (native C++ binding to OpenAI Whisper) for local processing and Deepgram REST API for cloud path, with unified IPC interface in main process that abstracts provider differences. Configuration persisted in electron-store allows seamless switching.

vs alternatives

More flexible than Whisper Flow (cloud-only) or Talon Voice (local-only) because it offers both paths with runtime selection, and more privacy-preserving than commercial dictation tools (Dragon, Otter) by supporting fully offline local transcription as default.

zero-telemetry privacy model with no analytics collection

Medium confidence

Implements a privacy-first architecture with zero telemetry — no analytics libraries, no tracking pixels, no data collection beyond what's necessary for core functionality. The app does not send usage data, crash reports, or user behavior analytics to any external service. All processing (transcription, LLM post-processing) can be done locally without cloud connectivity, and cloud processing (Deepgram, LLM APIs) only sends audio/text when explicitly configured by the user.

Solves for

I want to use a voice dictation app that doesn't track my usage or send data to analytics servicesI need assurance that my voice recordings and transcripts are not collected or stored by the app developerI want to use the app entirely offline without any telemetry or phone-home behavior

Best for

privacy-conscious users who avoid commercial voice tools (Otter, Fireflies) due to data collection concerns

organizations with strict data privacy policies (HIPAA, GDPR, SOC 2)

developers building privacy-first voice applications

Requires

No external dependencies for analytics or telemetry

Local-only processing (Whisper + Ollama) for full privacy, or cloud processing (Deepgram + LLM APIs) with user consent

Limitations

No crash reporting — if the app crashes, users must manually report bugs without automatic error telemetry

No usage analytics — developers cannot track feature adoption or user behavior to inform product decisions

No error tracking service (Sentry, Rollbar) — bugs may go unnoticed if users don't report them

What makes it unique

Explicitly excludes all analytics and telemetry libraries from package.json and implements no tracking code — privacy is enforced by architecture rather than configuration. Supports fully offline processing (local Whisper + Ollama) as the default path, with cloud processing as an optional user-selected feature. No crash reporting, no error tracking, no usage analytics — complete transparency about data flow.

vs alternatives

More privacy-preserving than commercial tools (Otter, Fireflies, Whisper Flow) which collect usage analytics and store transcripts on their servers. More transparent than tools claiming privacy but using third-party SDKs for crash reporting or analytics.

ios beta support with testflight distribution

Medium confidence

Extends Jarvis to iOS via a beta version distributed through Apple TestFlight, enabling voice dictation on iPhone and iPad. The iOS implementation (ios/README.md) uses native iOS APIs for audio capture and keyboard integration, with the same dual-path architecture (local Whisper or cloud Deepgram) as the macOS version. TestFlight allows beta testing with up to 10,000 external testers before App Store release.

Solves for

I want to use voice dictation on my iPhone or iPad, not just my MacI want to test the iOS version before it's released on the App StoreI want the same privacy-first, dual-path architecture on mobile as on desktop

Best for

iOS users who want privacy-focused voice dictation on mobile

beta testers willing to provide feedback on iOS version

developers building cross-platform voice tools

Requires

iOS 14+ (typical for modern iOS apps)

TestFlight app installed on iOS device

Apple ID for TestFlight access

Limitations

iOS version is in beta — features may be incomplete or unstable compared to macOS version

TestFlight distribution limits testing to 10,000 external testers — not available to general public

iOS native APIs differ significantly from macOS — code sharing between platforms is limited to business logic (LLM prompts, provider abstraction)

What makes it unique

Extends the macOS dual-path architecture to iOS using native Swift/Objective-C APIs for audio capture and keyboard integration. Uses TestFlight for beta distribution, allowing community feedback before App Store release. Maintains feature parity with macOS version (local Whisper + Ollama, cloud Deepgram + LLM APIs) while adapting UI and interaction patterns for iOS.

vs alternatives

More privacy-preserving than commercial iOS dictation apps (Otter, Fireflies) because it supports local-only processing. More feature-complete than iOS's built-in dictation because it adds grammar correction and filler removal via LLM post-processing.

ai-powered post-processing with filler removal and grammar correction

Medium confidence

Chains transcribed text through an LLM-based post-processing pipeline that removes filler words ('um', 'like', 'uh'), corrects grammar, adds punctuation, and enhances readability. The system supports dual-path LLM routing: local Ollama server (models: sam860/LFM2:1.2b, llama3, mistral) for offline processing or cloud LLMs (Gemini, Claude, OpenAI) for higher quality. Post-processing is triggered automatically after transcription completes, with results cached to avoid re-processing identical transcripts.

Solves for

I want my raw speech-to-text output cleaned up automatically — remove filler words and fix grammarI need punctuation and capitalization added to my transcripts without manual editingI want to choose between fast local processing or high-quality cloud LLM enhancement based on my needs

Best for

users dictating long-form content (emails, documents, notes) who want publication-ready text

non-native English speakers who benefit from grammar correction

teams with strict data privacy requirements who need local LLM processing

Requires

For local path: Ollama installed and running, Node.js 18+, 4GB+ RAM for model inference

For cloud path: API key for OpenAI, Anthropic, or Google Gemini; active internet connection

macOS 10.13+

Limitations

Local Ollama models (1.2B-7B parameters) produce lower quality output than GPT-4 or Claude — grammar correction may miss complex syntax errors

Post-processing adds 1-3 second latency for local models, 2-5 seconds for cloud LLMs depending on text length and API response time

Ollama server must be running separately and configured with correct port (default 11434) — no built-in Ollama lifecycle management

What makes it unique

Implements a dual-path LLM chain with provider abstraction — routes transcribed text to either local Ollama server or cloud LLM APIs (Gemini/Claude/OpenAI) via a unified interface. Uses prompt engineering to instruct LLM to remove fillers, fix grammar, and add punctuation in a single pass. Caches results keyed by transcript hash to avoid re-processing identical inputs, reducing latency and API costs on repeated dictation.

vs alternatives

More comprehensive than Whisper Flow's basic punctuation (which only adds periods) because it combines filler removal, grammar correction, and punctuation in an LLM-driven pipeline. More privacy-preserving than commercial tools (Otter, Fireflies) by supporting fully local Ollama processing, and more cost-effective than cloud-only solutions by offering local fallback.

ipc-based main-renderer process communication with security sandboxing

Medium confidence

Implements a secure inter-process communication (IPC) bridge between Electron's main process (native module access, file I/O, API calls) and renderer process (UI, user interactions) using ipcMain and ipcRenderer with preload script isolation. The preload script (src/preload.ts) exposes a whitelist of safe IPC channels (e.g., 'start-recording', 'transcribe-audio', 'update-settings') to the renderer, preventing direct access to Node.js APIs and enforcing context isolation. Audio buffers and settings are marshaled through IPC as serialized JSON or binary data.

Solves for

I need to safely communicate between the Electron main process (native modules, file system) and renderer process (UI) without exposing Node.js APIsI want to ensure the renderer process cannot directly access sensitive operations like file I/O or native module callsI need to pass audio buffers and configuration data between processes efficiently

Best for

Electron developers building secure desktop applications with native module integration

teams implementing defense-in-depth security with process isolation

developers maintaining large Electron codebases with multiple renderer processes

Requires

Electron 12+ (context isolation and preload script support)

Node.js 18+

TypeScript or JavaScript with async/await for IPC promise handling

Limitations

IPC message passing adds ~5-10ms latency per round-trip due to serialization and process boundary crossing

Large audio buffers (>10MB) may cause performance degradation or memory spikes if not streamed

Preload script whitelist must be manually maintained — adding new IPC channels requires code changes and app rebuild

What makes it unique

Uses Electron's preload script (src/preload.ts) with context isolation enabled to expose a whitelist of safe IPC channels to the renderer, preventing direct Node.js API access while maintaining full main process capabilities. Implements channel-based message routing in main.ts that dispatches IPC calls to appropriate handlers (native modules, API clients, file I/O), with error handling and response marshaling. Audio buffers are passed as binary data through IPC using Electron's native serialization.

vs alternatives

More secure than older Electron patterns (nodeIntegration: true) because it enforces process isolation and API whitelisting, preventing renderer process compromise from accessing file system or native modules. More maintainable than custom socket-based IPC because it uses Electron's built-in IPC with automatic serialization.

settings persistence with electron-store and onboarding flow

Medium confidence

Persists user configuration (transcription provider, LLM choice, API keys, keyboard shortcuts) to disk using electron-store, a lightweight JSON-based key-value store that encrypts sensitive data (API keys) at rest. The onboarding interface (Onboarding Interface component) guides first-time users through provider selection (local vs cloud), API key configuration, and keyboard shortcut customization. Settings are loaded on app startup and cached in memory; changes trigger IPC updates to all processes and persist immediately to disk.

Solves for

I want my transcription provider choice, API keys, and keyboard shortcuts to persist across app restartsI want a guided setup flow that helps me choose between local and cloud processing on first launchI want to update settings without restarting the app and have changes take effect immediately

Best for

users who want to configure the app once and have settings persist

teams deploying Jarvis across multiple machines who need consistent configuration

developers building Electron apps with persistent user preferences

Requires

Electron 12+

electron-store dependency (included in package.json)

Write access to user home directory

Limitations

electron-store uses unencrypted JSON by default — API keys are only encrypted if explicitly configured with a custom encryption key

No built-in settings versioning or migration — upgrading app versions may break settings if schema changes

Settings stored in user's home directory (~/.config/jarvis-ai-assistant on Linux/macOS) — no centralized settings server for team management

What makes it unique

Uses electron-store for lightweight JSON-based persistence with optional encryption for sensitive data (API keys), avoiding the complexity of SQLite or external databases. Onboarding flow (Onboarding Interface component) is built as a separate Electron window that guides users through provider selection and API key configuration before main app launch. Settings changes trigger IPC broadcasts to all processes, ensuring UI and main process stay in sync without manual refresh.

vs alternatives

Simpler than Whisper Flow's cloud-based settings sync because it uses local-only persistence, and more user-friendly than manual config file editing because it provides a guided onboarding UI. Supports both local and cloud provider configuration in a single settings schema, unlike single-path tools.

native audio capture with system microphone integration

Medium confidence

Captures audio from the system microphone using Web Audio API (in renderer process) or native audio APIs (via native modules in main process), with automatic gain control and noise suppression. Audio is buffered in memory as PCM samples at 16kHz sample rate, then sent to the transcription pipeline via IPC. The system handles microphone permission requests (macOS Privacy & Security) and gracefully degrades if microphone is unavailable or denied.

Solves for

I want to record audio from my Mac's microphone when I press the Fn keyI need audio to be captured at high quality (16kHz, 16-bit PCM) for accurate transcriptionI want the app to ask for microphone permissions on first launch and remember my choice

Best for

macOS users who want to dictate using their built-in or external microphone

users in noisy environments who benefit from automatic gain control

developers building voice-input features in Electron apps

Requires

macOS 10.13+

Microphone permission granted in System Preferences

Electron runtime with Web Audio API support (all modern versions)

Limitations

Web Audio API (renderer process) has higher latency (~100-200ms) than native audio APIs due to browser security sandboxing

Automatic gain control and noise suppression are basic implementations — may not match quality of professional audio processing tools

Microphone must be granted permission in macOS System Preferences (Privacy & Security > Microphone) — no in-app permission prompt

What makes it unique

Uses Web Audio API in renderer process for cross-platform compatibility but can fall back to native audio modules in main process for lower latency and better control. Buffers audio at 16kHz (standard for speech recognition) and implements basic automatic gain control to normalize microphone input levels. Handles macOS microphone permission requests gracefully with user-friendly error messages.

vs alternatives

More integrated than browser-based Whisper Flow because it captures audio at the system level via Electron, avoiding browser tab audio limitations. More flexible than command-line tools (ffmpeg) because it provides real-time audio buffering and automatic format conversion.

multi-architecture native module compilation for apple silicon and intel

Medium confidence

Builds and distributes native C++ modules (fn_key_monitor.node, whisper-node-addon) as separate binaries for Apple Silicon (ARM64) and Intel (x64) architectures using node-gyp and electron-builder. The build process compiles native code with platform-specific optimizations, signs binaries with Apple developer certificate, and packages both architectures into a universal macOS app. At runtime, the app loads the correct binary based on process.arch, ensuring optimal performance on each architecture.

Solves for

I want to build native modules that work on both M1/M2 Macs and Intel Macs without users having to compileI need to distribute pre-compiled native binaries that are signed and notarized by AppleI want to ensure native modules run with optimal performance on each CPU architecture

Best for

Electron developers shipping native modules across multiple macOS architectures

teams with CI/CD pipelines that need to build and sign binaries for distribution

open-source projects that want to avoid requiring users to compile native code

Requires

Node.js 18+ with node-gyp installed

Xcode command-line tools (xcode-select --install)

Apple developer certificate for code signing

Limitations

Build process requires separate compilation for each architecture — cannot use cross-compilation, must build on native hardware or use CI/CD with architecture-specific runners

Native modules must be recompiled and re-signed whenever dependencies update (e.g., Whisper library updates) — no binary caching across versions

Apple notarization process adds 5-15 minutes to build time and requires valid Apple developer certificate and credentials

What makes it unique

Uses electron-builder with custom build scripts to compile native modules separately for Apple Silicon and Intel, then packages both binaries into a universal macOS app. Implements runtime architecture detection (process.arch) to load the correct binary without user intervention. Integrates Apple notarization into the build pipeline, eliminating security warnings on first launch.

vs alternatives

More user-friendly than requiring users to compile native modules locally (like some open-source projects) because binaries are pre-built and notarized. More maintainable than maintaining separate app versions for each architecture because a single universal app bundle contains both binaries.

hands-free toggle mode with double-tap gesture detection

Medium confidence

Implements a dual-mode interaction model where users can either hold Fn to record (single-press mode) or double-tap Fn to toggle hands-free recording on/off (toggle mode). The native keyboard module (fn_key_monitor.node) detects double-tap timing (two presses within 300ms) and switches recording state accordingly. In hands-free mode, the app continues recording until the user double-taps again, enabling dictation without holding a key.

Solves for

I want to record hands-free by double-tapping Fn, so I can dictate while typing or using both handsI need visual feedback (UI indicator) showing when hands-free recording is activeI want to quickly toggle hands-free mode on/off without switching to the app window

Best for

users who dictate long passages and want to keep their hands free for other tasks

accessibility users who cannot hold a key continuously

developers building voice-first interfaces with toggle-based recording

Requires

macOS 10.13+

Fn key available (not remapped to other function)

Accessibility permissions granted

Limitations

Double-tap detection is timing-based (300ms window) — may fail if user presses Fn slowly or with variable timing

No visual feedback in the app window if it's not focused — users cannot see hands-free status without switching to app

Hands-free mode has no automatic timeout — if user forgets to double-tap to stop, recording continues indefinitely until manual stop

What makes it unique

Implements double-tap detection in native C++ (fn_key_monitor.node) by tracking press/release timing, avoiding the latency of IPC-based gesture detection. Maintains hands-free state in the main process and broadcasts state changes to renderer via IPC, enabling UI indicators and preventing accidental re-triggering. Double-tap threshold (300ms) is configurable via settings.

vs alternatives

More intuitive than Whisper Flow's single hold-to-record model because it supports hands-free toggle for long dictation sessions. More reliable than software-based gesture detection because it operates at the native keyboard event level with sub-millisecond timing precision.

settings ui with provider selection and api key configuration

Medium confidence

Provides a React-based settings interface (Settings Interface component) where users can select transcription provider (local Whisper vs Deepgram), LLM provider (local Ollama vs cloud APIs), and configure API keys securely. The settings UI is rendered in a separate Electron window and communicates with the main process via IPC to read/write settings to electron-store. Sensitive fields (API keys) are masked in the UI and encrypted at rest in the settings file.

Solves for

I want a user-friendly UI to choose between local and cloud transcription without editing config filesI need to securely enter and store my API keys (Deepgram, OpenAI, Anthropic) in the appI want to see which provider is currently active and easily switch providers

Best for

non-technical users who want to configure the app via UI rather than config files

users managing multiple API keys for different providers

teams deploying Jarvis with standardized provider configurations

Requires

Electron 12+

React 18+ (included in package.json)

electron-store for settings persistence

Limitations

Settings UI is modal and blocks main app interaction while open — no live preview of settings changes

API key validation is client-side only — invalid keys are not detected until first use

No support for multiple API key profiles or switching between accounts — only one key per provider stored

What makes it unique

Implements settings as a separate Electron window (not a modal in main window) with dedicated React component tree, avoiding UI blocking. Uses IPC to communicate settings changes to main process, which broadcasts updates to all processes via electron-store. API keys are masked in UI and encrypted at rest using electron-store's encryption feature.

vs alternatives

More user-friendly than command-line configuration (like Talon Voice) because it provides a graphical interface. More secure than storing API keys in plaintext config files because it uses electron-store's built-in encryption.

macos app signing and notarization for distribution

Medium confidence

Automates code signing and Apple notarization of the macOS app bundle using electron-builder and custom build scripts. The build process signs the app with a valid Apple developer certificate, submits it to Apple's notarization service, and waits for approval before packaging the DMG installer. Notarized apps display no security warnings when users download and launch them, improving trust and reducing support burden.

Solves for

I want to distribute a macOS app that doesn't show 'unidentified developer' warnings when users download itI need to automate the code signing and notarization process in CI/CD so releases are fast and reliableI want users to trust the app is legitimate and hasn't been tampered with

Best for

open-source projects distributing macOS apps to end users

teams with CI/CD pipelines that need automated app signing and notarization

developers shipping Electron apps on macOS who want to avoid security warnings

Requires

Apple developer account with valid certificate

Apple ID and app-specific password for notarization

electron-builder 24.0+

Limitations

Requires valid Apple developer certificate ($99/year) and Apple ID credentials — not available for free or open-source developers without paying Apple

Notarization process is asynchronous and can take 5-15 minutes — build time is unpredictable

If notarization fails (e.g., malware detected), the entire build fails and must be retried — no partial success

What makes it unique

Integrates Apple notarization into electron-builder's build pipeline using custom scripts that submit the app to Apple's notarization service and poll for approval status. Stores Apple ID credentials securely in environment variables and uses app-specific passwords (not main Apple ID password) for added security. Notarization ticket is stapled to the app bundle, allowing offline verification.

vs alternatives

More user-friendly than distributing unsigned apps because users don't see security warnings. More secure than self-signing because Apple's notarization service scans for malware and verifies developer identity.

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Related Artifactssharing capabilities

Artifacts that share capabilities with 🎙️ OpenSource Voice Dictation Agent (Wispr Flow clone), ranked by overlap. Discovered automatically through the match graph.

Repository24

Teleprompter

An on-device AI for your meetings that listens to you and makes charismatic quote suggestions.

privacy-preserving local processing with no cloud transmissionreal-time speech-to-text transcription with meeting context awareness

2 shared capabilities

Product29

Ermine

Local, secure, and efficient audio transcription on your...

local-audio-transcriptionon-device-processing-execution

2 shared capabilities

Product30

Wave

Transform audio into text and summaries effortlessly on iOS;...

privacy-preserving local processingon-device speech-to-text transcription

2 shared capabilities

Web App30

Dictation IO

Transform speech into text instantly, enhancing productivity across...

real-time browser-based speech-to-text transcriptionzero-installation cross-device web access

2 shared capabilities

Repository27

Screenpipe

An open-source tool for recording screen and audio activity with AI-powered search, automations, and support for local LLMs. #opensource

continuous audio transcription with voice activity detection

1 shared capability

Product30

Cleft

Transforms voice to structured markdown notes, ensuring privacy and...

local-device speech-to-text transcription with privacy isolation

1 shared capability

Best For

✓macOS desktop users who spend time in multiple applications (email, documents, code editors)
✓accessibility-focused users who need hands-free text input
✓developers building privacy-first voice tools on Electron
✓privacy-conscious users who cannot send audio to cloud services
✓teams with strict data residency requirements (HIPAA, GDPR)
✓developers building voice tools with configurable transcription backends
✓users in regions with poor internet connectivity who need offline fallback
✓privacy-conscious users who avoid commercial voice tools (Otter, Fireflies) due to data collection concerns

Known Limitations

⚠macOS-only implementation — no Windows or Linux support due to platform-specific keyboard event APIs
⚠Requires accessibility permissions (macOS Security & Privacy settings) which users must manually grant
⚠Native module must be recompiled for both Apple Silicon (M1/M2/M3/M4) and Intel architectures
⚠Fn key binding conflicts with system shortcuts on some Mac models or keyboard layouts
⚠Local Whisper models (tiny/base/small) have lower accuracy than cloud providers — WER typically 10-15% higher than Deepgram
⚠Local transcription adds 2-5 second latency on M1/M2 Macs due to model inference time; larger models (medium/large) require 8GB+ RAM and are not bundled

Requirements

macOS 10.13 or laterAccessibility permissions granted to application in System PreferencesNode.js 18+ for native module compilationElectron runtime with native module supportFor local path: Node.js 18+, whisper-node-addon dependency, 2GB+ free disk space for model filesFor cloud path: Deepgram API key, active internet connectionmacOS 10.13+ with compatible CPU (Apple Silicon or Intel x64)No external dependencies for analytics or telemetry

Input / Output

Accepts: keyboard events (Fn key press/release, double-tap detection), audio stream from system microphone, audio buffer (WAV/PCM format, 16kHz sample rate), language code (e.g., 'en', 'es', 'fr'), transcription provider config (local vs cloud), user configuration (provider selection), audio input (microphone), text input (settings), provider configuration (local vs cloud), API keys (if using cloud providers), raw transcribed text string, language code, LLM provider config (local Ollama vs cloud API), optional: custom system prompt for LLM, IPC channel name (string), message payload (JSON-serializable object or binary data), optional: callback function for async responses, setting key (string), setting value (string, number, boolean, object), optional: encryption key for sensitive data, microphone device (system default), recording duration (seconds), optional: audio processing settings (gain, noise suppression), C++ source code, binding.gyp configuration file, platform target (arm64 or x64), keyboard events (Fn key press/release with timing), double-tap threshold (milliseconds, default 300ms), provider selection (dropdown: 'local-whisper', 'deepgram', 'ollama', 'openai', etc.), API key (text input, masked), Ollama server URL (text input, default 'http://localhost:11434'), keyboard shortcut configuration, app bundle (Electron app directory), Apple developer certificate (p12 file), Apple ID credentials, notarization team ID

Produces: audio buffer (WAV/PCM format), trigger signals to transcription pipeline, transcribed text string, confidence scores per word (Deepgram only), language detection results, transcribed text, processed text, no telemetry data sent externally, text insertion into active text field, cleaned, punctuated text string, confidence score (cloud providers only), processing metadata (latency, token count), response data (JSON or binary), error objects on IPC failure, event emitter for streaming responses, persisted setting value, settings object (all settings), IPC event notification on settings change, audio buffer (PCM, 16kHz, 16-bit), audio metadata (duration, sample count, format), compiled .node binary file, signed and notarized macOS app bundle, DMG installer for distribution, hands-free mode state (boolean: on/off), IPC event to update UI indicator, audio recording start/stop signals, updated settings object, IPC event to main process with new configuration, validation errors (if any), signed app bundle, notarization ticket (UUID), DMG installer ready for distribution

UnfragileRank

Adoption15%(30% weight)

Quality23%(20% weight)

Ecosystem30%(15% weight)

Match Graph25%(30% weight)

Freshness75%(5% weight)

UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.

Type: Repository

12 capabilities

Visit 🎙️ OpenSource Voice Dictation Agent (Wispr Flow clone)→

About

<sub>↗ external</sub>

Alternatives to 🎙️ OpenSource Voice Dictation Agent (Wispr Flow clone)

IntelliCode46Extension

AI-assisted development

Compare →

GitHub Copilot Chat49Extension

AI chat features powered by Copilot

Compare →

GitHub Copilot48Extension

Your AI pair programmer

Compare →

Claude Code for VS Code48Extension

Claude Code for VS Code: Harness the power of Claude Code without leaving your IDE

Compare →

Are you the builder of 🎙️ OpenSource Voice Dictation Agent (Wispr Flow clone)?

Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.

Claim this artifact →Verification via email

Get the weekly brief

New tools, rising stars, and what's actually worth your time. No spam.

Data Sources

github awesome

Looking for something else?

Search →

Capabilities12 decomposed

push-to-talk voice dictation with native keyboard interception

Medium confidence

Solves for

Best for

macOS desktop users who spend time in multiple applications (email, documents, code editors)

accessibility-focused users who need hands-free text input

developers building privacy-first voice tools on Electron

Requires

macOS 10.13 or later

Accessibility permissions granted to application in System Preferences

Node.js 18+ for native module compilation

Limitations

macOS-only implementation — no Windows or Linux support due to platform-specific keyboard event APIs

Requires accessibility permissions (macOS Security & Privacy settings) which users must manually grant

Native module must be recompiled for both Apple Silicon (M1/M2/M3/M4) and Intel architectures

What makes it unique

vs alternatives

dual-path transcription with local whisper or cloud deepgram

Medium confidence

Solves for

Best for

privacy-conscious users who cannot send audio to cloud services

teams with strict data residency requirements (HIPAA, GDPR)

developers building voice tools with configurable transcription backends

Requires

For local path: Node.js 18+, whisper-node-addon dependency, 2GB+ free disk space for model files

For cloud path: Deepgram API key, active internet connection

macOS 10.13+ with compatible CPU (Apple Silicon or Intel x64)

Limitations

Local Whisper models (tiny/base/small) have lower accuracy than cloud providers — WER typically 10-15% higher than Deepgram

Local transcription adds 2-5 second latency on M1/M2 Macs due to model inference time; larger models (medium/large) require 8GB+ RAM and are not bundled

Deepgram API requires valid API key and internet connectivity; no offline fallback if network fails mid-session

What makes it unique

vs alternatives

zero-telemetry privacy model with no analytics collection

Medium confidence

Solves for

Best for

privacy-conscious users who avoid commercial voice tools (Otter, Fireflies) due to data collection concerns

organizations with strict data privacy policies (HIPAA, GDPR, SOC 2)

developers building privacy-first voice applications

Requires

No external dependencies for analytics or telemetry

Local-only processing (Whisper + Ollama) for full privacy, or cloud processing (Deepgram + LLM APIs) with user consent

Limitations

No crash reporting — if the app crashes, users must manually report bugs without automatic error telemetry

No usage analytics — developers cannot track feature adoption or user behavior to inform product decisions

No error tracking service (Sentry, Rollbar) — bugs may go unnoticed if users don't report them

What makes it unique

vs alternatives

ios beta support with testflight distribution

Medium confidence

Solves for

Best for

iOS users who want privacy-focused voice dictation on mobile

beta testers willing to provide feedback on iOS version

developers building cross-platform voice tools

Requires

iOS 14+ (typical for modern iOS apps)

TestFlight app installed on iOS device

Apple ID for TestFlight access

Limitations

iOS version is in beta — features may be incomplete or unstable compared to macOS version

TestFlight distribution limits testing to 10,000 external testers — not available to general public

iOS native APIs differ significantly from macOS — code sharing between platforms is limited to business logic (LLM prompts, provider abstraction)

What makes it unique

vs alternatives

ai-powered post-processing with filler removal and grammar correction

Medium confidence

Solves for

Best for

users dictating long-form content (emails, documents, notes) who want publication-ready text

non-native English speakers who benefit from grammar correction

teams with strict data privacy requirements who need local LLM processing

Requires

For local path: Ollama installed and running, Node.js 18+, 4GB+ RAM for model inference

For cloud path: API key for OpenAI, Anthropic, or Google Gemini; active internet connection

macOS 10.13+

Limitations

Local Ollama models (1.2B-7B parameters) produce lower quality output than GPT-4 or Claude — grammar correction may miss complex syntax errors

Post-processing adds 1-3 second latency for local models, 2-5 seconds for cloud LLMs depending on text length and API response time

Ollama server must be running separately and configured with correct port (default 11434) — no built-in Ollama lifecycle management

What makes it unique

vs alternatives

ipc-based main-renderer process communication with security sandboxing

Medium confidence

Solves for

Best for

Electron developers building secure desktop applications with native module integration

teams implementing defense-in-depth security with process isolation

developers maintaining large Electron codebases with multiple renderer processes

Requires

Electron 12+ (context isolation and preload script support)

Node.js 18+

TypeScript or JavaScript with async/await for IPC promise handling

Limitations

IPC message passing adds ~5-10ms latency per round-trip due to serialization and process boundary crossing

Large audio buffers (>10MB) may cause performance degradation or memory spikes if not streamed

Preload script whitelist must be manually maintained — adding new IPC channels requires code changes and app rebuild

What makes it unique

vs alternatives

settings persistence with electron-store and onboarding flow

Medium confidence

Solves for

Best for

users who want to configure the app once and have settings persist

teams deploying Jarvis across multiple machines who need consistent configuration

developers building Electron apps with persistent user preferences

Requires

Electron 12+

electron-store dependency (included in package.json)

Write access to user home directory

Limitations

electron-store uses unencrypted JSON by default — API keys are only encrypted if explicitly configured with a custom encryption key

No built-in settings versioning or migration — upgrading app versions may break settings if schema changes

Settings stored in user's home directory (~/.config/jarvis-ai-assistant on Linux/macOS) — no centralized settings server for team management

What makes it unique

vs alternatives

native audio capture with system microphone integration

Medium confidence

Solves for

Best for

macOS users who want to dictate using their built-in or external microphone

users in noisy environments who benefit from automatic gain control

developers building voice-input features in Electron apps

Requires

macOS 10.13+

Microphone permission granted in System Preferences

Electron runtime with Web Audio API support (all modern versions)

Limitations

Web Audio API (renderer process) has higher latency (~100-200ms) than native audio APIs due to browser security sandboxing

Automatic gain control and noise suppression are basic implementations — may not match quality of professional audio processing tools

Microphone must be granted permission in macOS System Preferences (Privacy & Security > Microphone) — no in-app permission prompt

What makes it unique

vs alternatives

multi-architecture native module compilation for apple silicon and intel

Medium confidence

Solves for

Best for

Electron developers shipping native modules across multiple macOS architectures

teams with CI/CD pipelines that need to build and sign binaries for distribution

open-source projects that want to avoid requiring users to compile native code

Requires

Node.js 18+ with node-gyp installed

Xcode command-line tools (xcode-select --install)

Apple developer certificate for code signing

Limitations

Build process requires separate compilation for each architecture — cannot use cross-compilation, must build on native hardware or use CI/CD with architecture-specific runners

Native modules must be recompiled and re-signed whenever dependencies update (e.g., Whisper library updates) — no binary caching across versions

Apple notarization process adds 5-15 minutes to build time and requires valid Apple developer certificate and credentials

What makes it unique

vs alternatives

hands-free toggle mode with double-tap gesture detection

Medium confidence

Solves for

Best for

users who dictate long passages and want to keep their hands free for other tasks

accessibility users who cannot hold a key continuously

developers building voice-first interfaces with toggle-based recording

Requires

macOS 10.13+

Fn key available (not remapped to other function)

Accessibility permissions granted

Limitations

Double-tap detection is timing-based (300ms window) — may fail if user presses Fn slowly or with variable timing

No visual feedback in the app window if it's not focused — users cannot see hands-free status without switching to app

Hands-free mode has no automatic timeout — if user forgets to double-tap to stop, recording continues indefinitely until manual stop

What makes it unique

vs alternatives

settings ui with provider selection and api key configuration

Medium confidence

Solves for

Best for

non-technical users who want to configure the app via UI rather than config files

users managing multiple API keys for different providers

teams deploying Jarvis with standardized provider configurations

Requires

Electron 12+

React 18+ (included in package.json)

electron-store for settings persistence

Limitations

Settings UI is modal and blocks main app interaction while open — no live preview of settings changes

API key validation is client-side only — invalid keys are not detected until first use

No support for multiple API key profiles or switching between accounts — only one key per provider stored

What makes it unique

vs alternatives

macos app signing and notarization for distribution

Medium confidence

Solves for

Best for

open-source projects distributing macOS apps to end users

teams with CI/CD pipelines that need automated app signing and notarization

developers shipping Electron apps on macOS who want to avoid security warnings

Requires

Apple developer account with valid certificate

Apple ID and app-specific password for notarization

electron-builder 24.0+

Limitations

Requires valid Apple developer certificate ($99/year) and Apple ID credentials — not available for free or open-source developers without paying Apple

Notarization process is asynchronous and can take 5-15 minutes — build time is unpredictable

If notarization fails (e.g., malware detected), the entire build fails and must be retried — no partial success

What makes it unique

vs alternatives

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Alternatives to 🎙️ OpenSource Voice Dictation Agent (Wispr Flow clone)

IntelliCode46Extension

AI-assisted development

Compare →

GitHub Copilot Chat49Extension

AI chat features powered by Copilot

Compare →

GitHub Copilot48Extension

Your AI pair programmer

Compare →

Claude Code for VS Code48Extension

Claude Code for VS Code: Harness the power of Claude Code without leaving your IDE

Compare →

🎙️ OpenSource Voice Dictation Agent (Wispr Flow clone)

Capabilities12 decomposed

push-to-talk voice dictation with native keyboard interception

dual-path transcription with local whisper or cloud deepgram

zero-telemetry privacy model with no analytics collection

ios beta support with testflight distribution

ai-powered post-processing with filler removal and grammar correction

ipc-based main-renderer process communication with security sandboxing

settings persistence with electron-store and onboarding flow

native audio capture with system microphone integration

multi-architecture native module compilation for apple silicon and intel

hands-free toggle mode with double-tap gesture detection

settings ui with provider selection and api key configuration

macos app signing and notarization for distribution

Related Artifactssharing capabilities

Teleprompter

Ermine

Wave

Dictation IO

Screenpipe

Cleft

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

About

Categories

Alternatives to 🎙️ OpenSource Voice Dictation Agent (Wispr Flow clone)

Are you the builder of 🎙️ OpenSource Voice Dictation Agent (Wispr Flow clone)?

Get the weekly brief

Data Sources

🎙️ OpenSource Voice Dictation Agent (Wispr Flow clone)

Capabilities12 decomposed

push-to-talk voice dictation with native keyboard interception

dual-path transcription with local whisper or cloud deepgram

zero-telemetry privacy model with no analytics collection

ios beta support with testflight distribution

ai-powered post-processing with filler removal and grammar correction

ipc-based main-renderer process communication with security sandboxing

settings persistence with electron-store and onboarding flow

native audio capture with system microphone integration

multi-architecture native module compilation for apple silicon and intel

hands-free toggle mode with double-tap gesture detection

settings ui with provider selection and api key configuration

macos app signing and notarization for distribution

Related Artifactssharing capabilities

Teleprompter

Ermine

Wave

Dictation IO

Screenpipe

Cleft

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

About

Categories

Alternatives to 🎙️ OpenSource Voice Dictation Agent (Wispr Flow clone)

Are you the builder of 🎙️ OpenSource Voice Dictation Agent (Wispr Flow clone)?

Get the weekly brief

Data Sources