Multilingual Zero Shot Text Classification Via Natural Language Inference

1

xlm-roberta-baseModel54/100

via “zero-shot cross-lingual transfer for downstream tasks”

fill-mask model by undefined. 1,81,65,674 downloads.

Unique: Achieves effective zero-shot cross-lingual transfer through large-scale multilingual pretraining on 100+ languages, creating an implicit alignment of linguistic structures and semantic concepts across languages — unlike monolingual models or translation-based approaches that require explicit alignment or translation

vs others: Outperforms translation-based approaches (translate-train-predict) by avoiding translation artifacts and maintaining semantic coherence, while reducing computational cost compared to training separate models per language

2

bart-large-mnliModel51/100

via “zero-shot text classification via natural language inference”

zero-shot-classification model by undefined. 26,55,180 downloads.

Unique: Leverages BART's pre-training on denoising and seq2seq tasks combined with Multi-NLI fine-tuning to reformulate arbitrary classification as entailment reasoning, enabling true zero-shot capability without task-specific adaptation layers or fine-tuning

vs others: Outperforms GPT-2 and RoBERTa-based zero-shot classifiers on unseen categories due to explicit NLI training, while remaining 10-50x smaller and faster than GPT-3.5/4 APIs with no external dependencies

3

all-MiniLM-L6-v2Model50/100

via “semantic-text-classification-via-embedding-similarity”

feature-extraction model by undefined. 32,39,437 downloads.

Unique: Enables zero-shot text classification by leveraging semantic embeddings and prototype similarity — no training required, just representative text for each class. The distilled BERT model's semantic understanding makes prototype-based classification more accurate than keyword matching or rule-based approaches.

vs others: Faster to implement than training a supervised classifier; more flexible than fixed classifiers because classes can be added/modified without retraining; more accurate than keyword-based classification because it captures semantic meaning

4

multilingual-sentiment-analysisModel49/100

via “cross-lingual-sentiment-transfer-with-shared-embeddings”

text-classification model by undefined. 7,37,518 downloads.

Unique: Exploits DistilBERT's 104-language pretraining to enable zero-shot sentiment classification in languages not explicitly fine-tuned, by reusing the shared embedding space and learned classification head — avoiding language-specific model maintenance

vs others: More practical than training separate models per language (cost and complexity), but less accurate than language-specific fine-tuning; comparable to XLM-RoBERTa-based approaches but with faster inference due to DistilBERT's smaller size

5

distilbert-base-multilingual-cased-sentiments-studentModel48/100

via “zero-shot-cross-lingual-transfer-inference”

text-classification model by undefined. 6,63,335 downloads.

Unique: Achieves zero-shot cross-lingual transfer through distillation from DeBERTa-v3, which has stronger multilingual alignment than standard BERT. The student model inherits this alignment while being compact enough for production, enabling sentiment classification on unseen languages without fine-tuning or additional training data.

vs others: Outperforms monolingual sentiment models on cross-lingual tasks and requires no language-specific retraining, unlike traditional fine-tuned models that need labeled data per language.

6

mDeBERTa-v3-base-xnli-multilingual-nli-2mil7Model47/100

via “multilingual-zero-shot-text-classification”

zero-shot-classification model by undefined. 3,03,704 downloads.

Unique: Combines DeBERTa-v3's disentangled attention mechanism (which separates content and position representations) with XNLI's 2.7M cross-lingual NLI examples, enabling zero-shot classification across 11+ languages without language-specific fine-tuning. Unlike monolingual models or simpler multilingual baselines, this architecture preserves semantic relationships across typologically diverse languages through shared NLI reasoning patterns.

vs others: Outperforms mBERT and XLM-RoBERTa on zero-shot XNLI benchmarks (85%+ vs 75-80% accuracy) while supporting the same 11+ languages, and requires no task-specific labeled data unlike supervised classifiers, making it faster to deploy than fine-tuned alternatives for new domains.

7

DeBERTa-v3-large-mnli-fever-anli-ling-wanliModel46/100

via “zero-shot-classification-with-nli-entailment”

zero-shot-classification model by undefined. 2,25,548 downloads.

Unique: Trained on 5 diverse NLI datasets (MNLI, FEVER, ANLI, LingnLI, WANLI) with 1M+ examples, enabling robust entailment scoring across varied linguistic phenomena; DeBERTa-v3's disentangled attention (separate query-key and value attention) captures fine-grained semantic distinctions better than standard Transformer attention for premise-hypothesis matching

vs others: Outperforms BERT-base and RoBERTa-large on zero-shot tasks due to larger capacity (435M params) and multi-dataset NLI pretraining; faster inference than GPT-3.5 zero-shot while maintaining competitive accuracy on classification benchmarks

8

mDeBERTa-v3-base-mnli-xnliModel45/100

via “multilingual zero-shot text classification via natural language inference”

zero-shot-classification model by undefined. 2,28,003 downloads.

Unique: Combines DeBERTa-v3's disentangled attention (which separates content and position representations for better cross-lingual generalization) with NLI-based reformulation, enabling zero-shot classification across 11 languages without language-specific adapters. The MNLI+XNLI training ensures both English and cross-lingual entailment reasoning, unlike single-language zero-shot models.

vs others: Outperforms BERT-base and RoBERTa-base zero-shot classifiers by 3-8% on multilingual benchmarks due to DeBERTa's superior attention mechanism, and requires no language-specific fine-tuning unlike mBERT or XLM-R which need task adaptation for optimal performance.

9

distilbert-base-uncased-mnliModel45/100

via “zero-shot text classification with dynamic label inference”

zero-shot-classification model by undefined. 2,76,486 downloads.

Unique: Uses DistilBERT (40% smaller, 60% faster than BERT) fine-tuned on MNLI entailment tasks to enable zero-shot classification via reformulation as NLI premise-hypothesis scoring, avoiding the need for task-specific labeled data while maintaining competitive accuracy on diverse domains

vs others: Faster inference than full-scale BERT-based zero-shot classifiers and more flexible than fixed-label classifiers, but less accurate than domain-specific fine-tuned models and more sensitive to label phrasing than semantic similarity approaches

10

deberta-v3-large-zeroshot-v2.0Model45/100

via “zero-shot text classification with natural language labels”

zero-shot-classification model by undefined. 2,00,146 downloads.

Unique: Uses DeBERTa v3's disentangled attention mechanism (which separates content and position embeddings) combined with entailment-based reasoning, enabling more robust zero-shot classification than BERT-based alternatives; trained on diverse NLI datasets (MNLI, ANLI, FEVER) to generalize across domains without task-specific fine-tuning

vs others: Outperforms BART-large-mnli and RoBERTa-large-mnli on zero-shot benchmarks by 2-5% F1 due to DeBERTa's superior attention architecture, while maintaining similar inference speed; more accurate than simple semantic similarity approaches (e.g., sentence-transformers cosine matching) because it explicitly models entailment relationships

11

xlm-roberta-large-xnliModel44/100

via “multilingual zero-shot text classification”

zero-shot-classification model by undefined. 1,46,288 downloads.

Unique: Uses XLM-RoBERTa's 100+ language pretraining to enable true zero-shot classification across languages without language-specific fine-tuning, leveraging NLI task framing (premise-hypothesis entailment scoring) rather than direct classification heads, allowing arbitrary label sets at inference time

vs others: Outperforms language-specific zero-shot models (e.g., BERT-based classifiers) on non-English text and requires no fine-tuning unlike traditional classifiers, though slower than distilled models like DistilBERT for single-language tasks

12

nli-deberta-v3-smallModel43/100

via “zero-shot natural language inference classification”

zero-shot-classification model by undefined. 2,47,798 downloads.

Unique: Uses DeBERTa-v3-small's disentangled attention mechanism (separating content and position representations) combined with cross-encoder joint encoding, achieving higher accuracy on NLI than standard BERT-based classifiers while maintaining 40% smaller model size than DeBERTa-base variants

vs others: Outperforms bi-encoder zero-shot classifiers (e.g., CLIP-based approaches) on NLI-specific tasks due to joint premise-hypothesis encoding, while being 10x faster than large language models for the same task and requiring no API calls

13

deberta-v3-base-tasksource-nliModel43/100

via “zero-shot natural language inference classification”

zero-shot-classification model by undefined. 1,17,720 downloads.

Unique: Trained on TaskSource's 1000+ diverse NLI datasets via extreme multi-task learning (extreme-MTL), enabling generalization across unseen classification tasks without task-specific fine-tuning. Uses DeBERTa-v3's disentangled attention mechanism which separates content and position representations, improving cross-domain transfer compared to standard BERT-style attention.

vs others: Outperforms BERT-base and RoBERTa-base on zero-shot NLI by 3-8% accuracy due to TaskSource pretraining on 1000+ datasets, and requires no labeled data unlike supervised classifiers, making it faster to deploy than fine-tuned alternatives.

14

nli-deberta-v3-baseModel43/100

via “zero-shot natural language inference classification”

zero-shot-classification model by undefined. 1,87,439 downloads.

Unique: Uses cross-encoder architecture (joint premise-hypothesis processing) rather than bi-encoder siamese networks, enabling direct entailment classification without embedding space constraints. DeBERTa-v3-base's disentangled attention mechanism provides superior performance on NLI tasks compared to BERT-based alternatives, with 2-3% higher accuracy on SNLI/MultiNLI benchmarks while maintaining similar model size.

vs others: Outperforms BERT-based NLI models (e.g., bert-base-uncased fine-tuned on SNLI) by 2-4% accuracy due to DeBERTa's disentangled attention, and provides faster inference than larger models (RoBERTa-large) while maintaining competitive zero-shot generalization across domains.

15

nli-MiniLM2-L6-H768Model43/100

via “zero-shot natural language inference classification”

zero-shot-classification model by undefined. 2,58,745 downloads.

Unique: Uses a distilled cross-encoder architecture (MiniLMv2-L6-H768, 22.7M parameters) that jointly encodes premise-hypothesis pairs through a single transformer pass, enabling direct interaction modeling while maintaining <100ms inference latency on CPU — a balance point between bi-encoder speed and cross-encoder accuracy that most alternatives sacrifice

vs others: Faster than full-size cross-encoder NLI models (RoBERTa-Large) by 3-5x due to distillation, yet maintains competitive zero-shot entailment accuracy; slower than bi-encoder alternatives for ranking but captures semantic interactions that bi-encoders miss

16

DeBERTa-v3-base-mnli-fever-anliModel42/100

via “zero-shot text classification with natural language premises”

zero-shot-classification model by undefined. 64,968 downloads.

Unique: Uses DeBERTa-v3's disentangled attention mechanism (separate content and position embeddings) trained on three diverse NLI datasets (MNLI, FEVER, ANLI) to achieve superior zero-shot generalization compared to BERT-based classifiers; reformulates classification as premise-hypothesis entailment scoring rather than direct label prediction, enabling dynamic label sets without model modification

vs others: Outperforms BERT-base and RoBERTa-base on zero-shot classification benchmarks due to DeBERTa's architectural improvements and multi-dataset NLI training, while remaining computationally lighter than larger models like DeBERTa-large or T5-based classifiers

17

deberta-xlarge-mnliModel42/100

via “zero-shot task reformulation via entailment”

text-classification model by undefined. 5,13,435 downloads.

Unique: Leverages MNLI fine-tuning to generalize inference patterns to arbitrary task formulations without task-specific training. The disentangled attention mechanism enables the model to reason about semantic relationships in novel hypothesis-premise pairs, making zero-shot reformulation more robust than models trained only on generic language modeling objectives.

vs others: Outperforms zero-shot classification with generic language models (GPT-2, BERT) because inference-specific training enables better reasoning about entailment relationships; more efficient than prompting large language models (GPT-3) for zero-shot tasks due to smaller model size and lower latency.

18

nli-deberta-v3-largeModel41/100

via “zero-shot natural language inference classification”

zero-shot-classification model by undefined. 80,926 downloads.

Unique: Uses DeBERTa v3-large's disentangled attention mechanism (which separates content and position representations) combined with cross-encoder architecture that jointly encodes premise-hypothesis pairs, enabling more nuanced semantic relationship detection than bi-encoder alternatives that embed sentences independently

vs others: Outperforms BERT-based NLI models and general-purpose zero-shot classifiers on entailment tasks due to DeBERTa's superior architectural design and training on 900K+ NLI examples; faster than ensemble approaches while maintaining competitive accuracy

19

bge-m3-zeroshot-v2.0Model41/100

via “multilingual zero-shot text classification”

zero-shot-classification model by undefined. 56,557 downloads.

Unique: Built on BGE-M3 RetroMAE architecture trained on 53M multilingual text pairs with explicit optimization for dense retrieval and zero-shot classification across 111 languages simultaneously, unlike generic multilingual models that require task-specific fine-tuning or separate language-specific classifiers

vs others: Outperforms BERT-based zero-shot classifiers (e.g., facebook/bart-large-mnli) on non-English languages by 8-12% F1 due to XLM-RoBERTa's superior cross-lingual alignment, and requires no English-language fine-tuning unlike models trained primarily on English datasets

20

distilbart-mnli-12-3Model41/100

via “cross-lingual zero-shot classification via multilingual mnli transfer”

zero-shot-classification model by undefined. 1,01,237 downloads.

Unique: Leverages BART's multilingual token vocabulary and cross-lingual pretraining to apply English MNLI-trained entailment reasoning to non-English text without language-specific fine-tuning. Distillation to 3 layers preserves multilingual semantic alignment while reducing model size, enabling deployment in resource-constrained multilingual settings.

vs others: Simpler than maintaining separate language-specific classifiers and more practical than machine-translating text to English (which introduces translation errors). Cross-lingual transfer is weaker than language-specific fine-tuning but requires zero labeled data in target language.

Top Matches

Also Known As

Company