Token Classification For Named Entity Recognition

1

GladiaAPI59/100

via “named entity recognition (ner) extraction”

Enterprise audio transcription API with multi-engine accuracy across 100 languages.

Unique: Integrated into unified audio intelligence pipeline — single API call applies NER alongside transcription, diarization, and sentiment analysis. Most NER tools operate on text only without audio-aware context.

vs others: Bundled with transcription pricing; competitors require separate NER API calls (spaCy, Stanford CoreNLP, AWS Comprehend) with additional latency and cost.

2

NLTKRepository56/100

via “named entity recognition via chunking and classification”

Comprehensive NLP toolkit for education and research.

Unique: Combines rule-based chunking patterns (regex over POS tags) with statistical classification in a single framework, allowing users to implement custom NER via pattern engineering or train classifiers on annotated data without external dependencies

vs others: More transparent and customizable than spaCy's neural NER for educational purposes, but significantly less accurate (~85% vs 90%+) and limited to 4 entity types; no support for modern transformer-based models

3

bert-base-NERModel50/100

via “multilingual named entity recognition via token classification”

token-classification model by undefined. 18,11,113 downloads.

Unique: Leverages BERT's bidirectional transformer encoder with WordPiece subword tokenization fine-tuned specifically on CoNLL2003 NER task, providing strong contextual understanding of entity boundaries compared to CRF-only or BiLSTM baselines. Supports inference across PyTorch, TensorFlow, JAX, and ONNX backends from a single model checkpoint, enabling deployment flexibility without retraining.

vs others: Outperforms rule-based NER (regex, gazetteer) by 15-25 F1 points and matches spaCy's en_core_web_sm on CoNLL2003 while offering better cross-framework portability and lower inference latency on GPU hardware.

4

bert-large-cased-finetuned-conll03-englishFine-tune49/100

via “named entity recognition (ner) via token classification”

token-classification model by undefined. 11,08,389 downloads.

Unique: Uses BERT-large-cased (24 layers, 1024 hidden dims) fine-tuned specifically on CoNLL-03 English with BIO tagging scheme, providing a production-ready checkpoint that balances model capacity with inference speed; architecture includes a simple linear classification head (no CRF layer) enabling direct integration with HuggingFace Transformers pipeline API and multi-framework support (PyTorch, TensorFlow, JAX via safetensors)

vs others: Larger and more accurate than BERT-base NER models (dbmdz/bert-base-cased-finetuned-conll03-english) with 3x more parameters, while remaining deployable on modest hardware; outperforms spaCy's statistical NER on formal English text but requires GPU for production throughput

5

wikineural-multilingual-nerModel49/100

via “multilingual-token-level-named-entity-recognition”

token-classification model by undefined. 8,00,508 downloads.

Unique: Trained on WikiNEuRal dataset with consistent entity annotation schema across 10 languages, enabling zero-shot transfer to related languages and preserving entity type consistency across multilingual corpora through shared transformer embeddings rather than language-specific fine-tuning

vs others: Outperforms mBERT and XLM-RoBERTa baselines on WikiNEuRal benchmark (F1 +3-7%) while maintaining single-model inference for 10 languages, eliminating language detection and model-switching overhead compared to language-specific NER pipelines

6

roberta-large-ner-englishModel46/100

via “token-level named entity recognition with roberta embeddings”

token-classification model by undefined. 3,15,178 downloads.

Unique: Uses RoBERTa-large (355M params) instead of smaller BERT-base variants, providing 40% higher F1 on CoNLL2003 (96.4% vs 92.2%) through deeper contextual embeddings; trained specifically on English CoNLL2003 rather than generic multilingual models, optimizing for precision on news domain entities

vs others: Outperforms spaCy's English NER model (92% F1) and matches SOTA BERT-based NER on CoNLL2003 while being freely available and easily fine-tunable via HuggingFace transformers API

7

span-marker-mbert-base-multinerdModel46/100

via “multilingual named entity recognition with span-based token classification”

token-classification model by undefined. 2,49,148 downloads.

Unique: Uses span-marker architecture with mBERT base, enabling entity boundary detection and type classification in a unified span-based framework rather than traditional BIO tagging; trained on MultiNERD's 10+ entity types across 55 languages, providing broader entity coverage than single-language NER models

vs others: Outperforms spaCy's multilingual models on fine-grained entity types and handles more languages natively; faster than rule-based or regex approaches while maintaining higher accuracy on entity boundaries compared to token-only classifiers

8

bert-base-multilingual-cased-ner-hrlModel46/100

via “multilingual named entity recognition with token-level classification”

token-classification model by undefined. 2,87,100 downloads.

Unique: Multilingual BERT-base backbone trained on 10+ languages with unified vocabulary enables zero-shot cross-lingual transfer without language-specific model variants. Uses cased tokenization to preserve capitalization signals critical for proper noun detection, unlike uncased alternatives that lose this signal.

vs others: Outperforms language-specific NER models on low-resource languages due to cross-lingual transfer from high-resource languages in shared embedding space, while requiring 90% fewer model checkpoints than maintaining separate English/German/French/etc. NER systems.

9

xlm-roberta-large-ner-hrlModel46/100

via “multilingual named entity recognition with token-level classification”

token-classification model by undefined. 4,60,384 downloads.

Unique: Trained on 10+ languages including low-resource African languages (Hausa, Yoruba, Igbo, Swahili) using the Davlan HRL (Hausa, Yoruba, Igbo) dataset, enabling zero-shot transfer to languages not explicitly in training data via XLM-RoBERTa's cross-lingual embedding space. Most competing models (spaCy, Flair) are English-centric or require separate models per language.

vs others: Outperforms language-specific models on low-resource languages and matches mBERT-based NER on high-resource languages while supporting 100+ languages through a single model, reducing deployment complexity vs maintaining separate models per language.

10

bert-base-turkish-cased-nerModel45/100

via “turkish named entity recognition via token classification”

token-classification model by undefined. 3,40,882 downloads.

Unique: Purpose-built for Turkish morphology and orthography using BERT-base-cased architecture, which preserves Turkish case distinctions (e.g., İ vs i) critical for proper noun identification; fine-tuned on Turkish-specific NER corpora rather than multilingual models, enabling higher precision on Turkish entity boundaries and types

vs others: Outperforms multilingual BERT-base on Turkish NER by 3-5 F1 points due to Turkish-specific pretraining and fine-tuning, while maintaining smaller model size (~440MB) compared to larger Turkish language models or ensemble approaches

11

distilbert-NERModel44/100

via “token-level named entity recognition with distilled transformer inference”

token-classification model by undefined. 3,50,107 downloads.

Unique: Distilled architecture reduces model size to 268MB and inference latency by ~40% compared to BERT-base NER models while maintaining 97%+ F1 performance on CONLL2003, achieved through knowledge distillation from BERT-base with 6 encoder layers instead of 12

vs others: Smaller and faster than spaCy's transformer-based NER for CPU deployment, yet more accurate than rule-based or CRF-only approaches; trade-off is English-only and CONLL2003-specific entity types

12

ner-english-fastModel43/100

via “fast english named entity recognition via token classification”

token-classification model by undefined. 4,19,623 downloads.

Unique: Flair's BiLSTM-CRF architecture with character-level embeddings provides faster inference than transformer-based alternatives (BERT-based NER) while maintaining competitive F1 scores on CoNLL-2003 (96%+), achieved through aggressive parameter reduction (~110M parameters vs 340M+ for BERT-base) and optimized batch processing without attention mechanisms

vs others: Faster inference latency (10-50ms per sentence on CPU) and lower memory footprint than spaCy's transformer models or Hugging Face transformers-based NER, making it suitable for real-time or edge deployment where BERT-scale models are prohibitive

13

bert-base-NER-RussianModel40/100

token-classification model by undefined. 2,92,351 downloads.

Unique: This model is specifically fine-tuned for the Russian language, leveraging a multilingual BERT base to enhance its understanding of Russian syntax and semantics, which is often overlooked by models primarily trained on English data.

vs others: More accurate for Russian text than general multilingual models due to its specific fine-tuning on Russian datasets.

14

bert-portuguese-nerModel40/100

via “token classification for portuguese text”

token-classification model by undefined. 3,55,484 downloads.

Unique: This model is specifically fine-tuned for the Portuguese language, utilizing a large corpus of Portuguese text to enhance its understanding of linguistic nuances and context.

vs others: More accurate for Portuguese NER tasks compared to generic multilingual models due to its specialized training.

15

spacyFramework31/100

via “named entity recognition with neural sequence labeling and rule-based matching”

Industrial-strength Natural Language Processing (NLP) in Python

Unique: Integrates neural sequence labeling (BiLSTM/transformer) with rule-based matching (Matcher/PhraseMatcher) in a single pipeline, allowing users to combine statistical and symbolic approaches. EntityRuler component can override or augment neural predictions, enabling hybrid systems without custom code.

vs others: More flexible than pure neural NER (e.g., Hugging Face transformers) because it allows rule-based augmentation; more accurate than pure rule-based systems because it leverages pre-trained neural models. Faster than spaCy v2 because it uses transformer-based models with GPU support.

16

nltkRepository28/100

via “named entity recognition via chunking with tree-based output”

Natural Language Toolkit

Unique: Represents entities as nested tree structures rather than flat BIO-tagged sequences, enabling hierarchical entity relationships and visual tree-based analysis via `.draw()` method. Uses maximum entropy classifier trained on ACE corpus, providing interpretable feature-based entity recognition.

vs others: More transparent and educational than black-box neural NER models; tree-based output enables linguistic analysis and visualization; no external API calls or cloud dependencies required.

17

stanzaRepository27/100

via “named entity recognition with multi-token entity spans and language-specific models”

A Python NLP Library for Many Human Languages, by the Stanford NLP Group

Unique: Includes specialized biomedical/clinical NER models for English alongside general models for 60+ languages, with native multi-token entity span support — most competitors either focus on general NER or require separate biomedical pipelines

vs others: Biomedical models trained on clinical corpora outperform general models on medical text; unified API across general and specialized models reduces integration complexity vs using separate tools

18

Nous: Hermes 4 70BModel26/100

via “entity-extraction-and-named-entity-recognition”

Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It introduces the same hybrid mode as the larger 405B release, allowing the model to either...

Unique: Uses contextual embeddings from 70B parameters to disambiguate entity boundaries and types based on surrounding context, rather than relying on gazetteer matching or shallow pattern recognition

vs others: More accurate than spaCy NER for complex entity types; comparable to fine-tuned BERT models but with better generalization to unseen entity types

19

Prime Intellect: INTELLECT-3Model26/100

via “entity-recognition-and-information-extraction”

INTELLECT-3 is a 106B-parameter Mixture-of-Experts model (12B active) post-trained from GLM-4.5-Air-Base using supervised fine-tuning (SFT) followed by large-scale reinforcement learning (RL). It offers state-of-the-art performance for its size across math,...

Unique: RL post-training optimizes for entity boundary detection and type classification accuracy; uses sequence labeling patterns that preserve positional information for precise entity extraction

vs others: Recognizes entity boundaries and types more accurately than regex-based extraction while supporting custom entity types without explicit fine-tuning through prompt-based specification

20

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (BERT)Model21/100

via “named entity recognition with token-level tagging”

* 🏆 2020: [Language Models are Few-Shot Learners (GPT-3)](https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html)

Unique: Applies token-level classification on top of bidirectional Transformer representations, enabling each token's tag prediction to use full sentence context (both before and after the token), improving entity boundary and type disambiguation compared to unidirectional models or shallow sequence labeling

vs others: Bidirectional context improves NER accuracy compared to unidirectional models (e.g., BiLSTM-CRF) by enabling each token to condition on full sentence context, particularly beneficial for disambiguating entity boundaries and types in ambiguous contexts

Top Matches

Also Known As

Company