Mistral: Mistral NemoModel25/100 via “few-shot and zero-shot prompt adaptation”
A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese,...
Unique: Mistral Nemo's 12B architecture is optimized for instruction-following and prompt adaptation through training on diverse instruction datasets, making it particularly responsive to system prompts and few-shot examples compared to base models. The 128k context enables longer example sets than smaller-context models.
vs others: Smaller model size (12B) reduces inference latency and cost for prompt-based adaptation compared to 70B+ alternatives, while maintaining sufficient capacity for most few-shot tasks.