via “multilingual text-to-speech synthesis with 1100+ language coverage”
text-to-speech model by undefined. 4,10,302 downloads.
Unique: Uses a single unified VITS model trained on 1.4M hours of multilingual speech data (MMS corpus) with language-specific phoneme tokenization, enabling zero-shot synthesis for 1100+ languages including extremely low-resource languages (e.g., Uyghur, Amharic, Icelandic) without separate model checkpoints per language — most competitors maintain separate models for 10-50 languages or require expensive fine-tuning for new languages
vs others: Covers 1100+ languages in a single model versus Google Cloud TTS (100+ languages, proprietary, paid API) and gTTS (100+ languages but lower quality), while maintaining open-source licensing and local inference without cloud dependency