5 tools · Browse 5 multimodal AI artifacts on Unfragile.
Visual Question Answering with real images and human questions
Massive multitask multimodal understanding (images + text)
LLaVA — vision-language model combining CLIP and Vicuna — vision-capable
LLaVA on Llama 3 — improved vision-language on Llama 3 backbone — vision-capable
BakLLaVA — lightweight vision-language model — vision-capable
© 2026 Unfragile. Stronger through disorder.