Capability
Mixture Of Experts Code Generation With Sparse Activation
20 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
via “sparse-mixture-of-experts code generation with selective parameter activation”
DeepSeek's 236B MoE model specialized for code.
Unique: Uses DeepSeekMoE framework with dynamic router-based expert selection to activate only 21B/236B parameters per token, achieving 90.2% HumanEval performance while reducing inference memory by ~60% compared to dense 236B models through sparse activation patterns
vs others: Outperforms Llama-2-70B and Code-Llama-70B on HumanEval (90.2% vs 81.8% and 85.5%) while using 3.3x fewer active parameters, and matches GPT-4-Turbo performance with open-source weights and permissive licensing