Text Classification With Document Level Embeddings And Feed Forward Networks

1

spaCyFramework62/100

via “text classification with multi-label and multi-class support”

Industrial-strength NLP library for production use.

Unique: Integrates text classification directly into the pipeline, enabling classification to be composed with other NLP components (e.g., classify after NER). Supports both multi-class and multi-label scenarios with configurable thresholds, unlike many frameworks that default to single-label classification.

vs others: More integrated than scikit-learn classifiers; simpler than Hugging Face fine-tuning for small datasets; supports pipeline composition unlike standalone classifiers.

2

FlairRepository58/100

via “text classification with document-level embeddings and feed-forward networks”

PyTorch NLP framework with contextual embeddings.

Unique: Seamlessly integrates with Flair's embedding system to support any embedding type as input; includes native multi-label classification with automatic handling of label imbalance through weighted sampling; supports both single-task and multi-task learning where a classifier learns multiple classification tasks with shared embedding layers

vs others: Faster to train and deploy than transformer-based classifiers (BERT) with comparable accuracy on small-to-medium datasets; more flexible than scikit-learn classifiers by supporting deep learning and custom architectures; tighter integration with NLP preprocessing (tokenization, embedding) than generic PyTorch approaches

3

donut-baseModel42/100

via “visual-encoder-to-embedding-conversion”

image-to-text model by undefined. 1,50,036 downloads.

Unique: Implements a document-specific visual encoder that preserves spatial layout information through patch-based embeddings, enabling the downstream decoder to maintain awareness of document structure and text positioning rather than treating the image as a generic visual input

vs others: More layout-aware than generic vision encoders (CLIP, ViT) because it's trained specifically on document images, and more efficient than pixel-level processing because it operates on patch embeddings rather than raw pixels

4

gensimRepository31/100

via “doc2vec document embeddings (paragraph vector)”

Python framework for fast Vector Space Modelling

Unique: Implements Paragraph Vector (Doc2Vec) with both DM and DBOW variants, extending Word2Vec architecture with document ID tokens to learn document-level semantic representations through the same neural training objective

vs others: Simpler and faster to train than transformer-based document encoders; however, produces non-contextual embeddings and requires inference passes for new documents unlike pre-computed BERT embeddings

5

flairRepository27/100

via “text-classification-with-document-embeddings”

A very simple framework for state-of-the-art NLP

Unique: Flair's text classification decouples embedding computation from classification, allowing users to swap embedding sources (Flair contextual, BERT, GloVe, etc.) without retraining the classifier. This modular design enables rapid experimentation with different embedding strategies on the same classification task.

vs others: Flair's text classification is more flexible than spaCy's text categorizer (supports arbitrary embeddings) and simpler than HuggingFace transformers (no tokenizer configuration needed), while maintaining competitive accuracy through strong pre-trained embeddings.

6

colbert-aiRepository27/100

via “token-level document encoding with contextual bert embeddings”

Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

Unique: Uses token-level matrix representations instead of pooled single vectors, enabling MaxSim late-interaction matching where each query token independently compares against all document tokens — this preserves fine-grained semantic interactions lost in single-vector approaches like DPR

vs others: Achieves higher precision than single-vector dense retrievers (DPR, Sentence-BERT) while maintaining sub-100ms latency through efficient MaxSim computation, compared to sparse BM25 which sacrifices semantic understanding for speed

7

OpenAI CookbookRepository24/100

via “classification, clustering, and semantic search patterns”

Examples and guides for using the OpenAI API.

Top Matches

Also Known As

Company