via “multilingual speech-to-text transcription with language-agnostic encoder”
OpenAI's open-source speech recognition — 99 languages, translation, timestamps, runs locally.
Unique: Uses a single shared AudioEncoder + language-agnostic embeddings for all 99 languages rather than maintaining separate language-specific models, trained on 680,000 hours of diverse internet audio including accents, background noise, and technical language. Supports both English-only variants (tiny.en, base.en, small.en) for higher English accuracy and multilingual variants for cross-language support.
vs others: Outperforms language-specific ASR systems on diverse audio (accents, noise, technical terms) because it was trained on internet-scale heterogeneous data rather than clean studio recordings, and supports 99 languages with a single model architecture versus maintaining separate models per language.