via “robust handling of noisy and accented audio”
Robust speech recognition via large-scale weak supervision. [#opensource](https://github.com/openai/whisper)
Unique: Robustness emerges from training on 680,000 hours of diverse, weakly-supervised web audio rather than from explicit noise robustness techniques (e.g., SpecAugment, synthetic noise injection). The model learns to handle noise, accents, and technical language as natural variation in the training distribution.
vs others: More robust to real-world audio conditions than models trained on curated datasets (e.g., LibriSpeech) because training data reflects actual web audio diversity. Outperforms specialized noise-robust models on accented and technical speech because robustness is learned across all variation types simultaneously.