Capability
Distilled Transformer Inference With Reduced Parameter Footprint
2 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
zero-shot-classification model by undefined. 2,28,990 downloads.
Unique: Distilled from RoBERTa-Large specifically for NLI tasks using knowledge distillation, achieving 15x parameter reduction while maintaining >90% of teacher model accuracy on SNLI/MultiNLI benchmarks — most lightweight NLI alternatives either use non-distilled architectures or sacrifice accuracy more severely
vs others: Faster CPU inference than full-size cross-encoders (RoBERTa-Large, BERT-Large) by 3-5x; more accurate than simple bi-encoder baselines on entailment tasks due to cross-encoder architecture, despite smaller size