Qwen3-TTS-12Hz-1.7B-CustomVoiceModel48/100 via “multilingual text-to-speech synthesis with language-aware tokenization”
text-to-speech model by undefined. 15,92,474 downloads.
Unique: Uses unified transformer encoder-decoder with language-aware attention masks and script-specific embedding layers, enabling single-model multilingual synthesis without separate language-specific models. Language tokens are injected into the attention computation, allowing dynamic language switching within streaming inference.
vs others: Supports code-switching and language mixing in single utterances (unlike most commercial TTS APIs that require separate calls per language) and maintains consistent voice identity across languages without separate speaker adaptation per language.