Full Text Search With Keyword Indexing And Filtering

1

LanceDBPlatform59/100

via “hybrid search combining vector and full-text retrieval”

Serverless embedded vector DB — Lance format, multimodal, versioning, no server needed.

Unique: Integrates full-text and vector search at the storage layer using Lance's columnar format, avoiding separate indices and enabling single-pass retrieval; combines both modalities without requiring external search engines like Elasticsearch

vs others: Simpler than Elasticsearch + vector plugin because both search modes share the same columnar storage, but less mature than Pinecone's hybrid search in terms of tuning options and performance optimization

2

GlaspExtension58/100

via “full-text-search-across-highlights”

Social web highlighter with AI summarization.

Unique: Implements full-text search with relevance ranking and metadata filtering, indexing highlight text and source metadata to enable fast retrieval across large libraries. Uses a search backend (likely Elasticsearch) to support boolean operators and phrase matching in paid tiers.

vs others: More powerful than browser-based search (Ctrl+F) because it searches across all highlights and sources, not just the current page. More accessible than building a custom search index because search is built-in and requires no configuration.

3

MeilisearchRepository58/100

via “typo-tolerant full-text search with inverted indexes”

Lightning-fast search engine with vector search.

Unique: Uses word_pair_proximity_docids indexes to track word adjacency during indexing, enabling proximity-aware ranking without post-search filtering. Charabia tokenization handles typo tolerance at index time rather than query time, avoiding expensive edit-distance calculations on every search.

vs others: Faster than Elasticsearch for typo-tolerant search because proximity indexes are pre-computed at index time rather than calculated at query time; simpler to deploy than Solr because it's a single Rust binary with no JVM overhead.

4

llama_indexMCP Server57/100

via “document-level metadata filtering and structured querying”

LlamaIndex is the leading document agent and OCR platform

Unique: Provides integrated metadata filtering across all retrieval strategies with a unified query language for combining semantic search and structured constraints. Unlike LangChain's metadata filtering (which is retriever-specific), LlamaIndex's filtering works consistently across vector, keyword, and graph retrieval.

vs others: Enables consistent metadata filtering across all retrieval types with a unified query interface, whereas LangChain requires separate filtering logic per retriever type.

5

TurbopufferProduct55/100

via “bm25 full-text search with metadata filtering”

Low-cost vector database — pay-per-query, S3-backed, up to 10x cheaper at scale.

Unique: Integrates BM25 full-text search as a first-class capability alongside vector search within the same API, enabling hybrid search queries that combine both ranking signals without requiring separate search infrastructure or post-processing to merge results

vs others: Simpler than maintaining separate Elasticsearch/Meilisearch instances for keyword search because full-text and vector search are unified in a single API with shared namespace isolation and S3 storage

6

RediSearchMCP Server55/100

via “full-text search with boolean operators and phrase matching”

A query and indexing engine for Redis, providing secondary indexing, full-text search, vector similarity search and aggregations.

Unique: Uses a trie-based term dictionary with incremental indexing via Redis keyspace notifications (src/redis_index.c), enabling real-time index updates without batch reindexing, unlike traditional search engines that require explicit commit/refresh cycles

vs others: Faster than Elasticsearch for sub-million-document workloads because it avoids network round-trips and leverages Redis' in-memory architecture; simpler operational model than Solr with no separate JVM process

7

lancedbRepository48/100

via “full-text-search-with-bm25-ranking”

Developer-friendly OSS embedded retrieval library for multimodal AI. Search More; Manage Less.

Unique: Integrates BM25 full-text search directly into the Lance storage layer rather than as a separate index type, allowing hybrid vector+FTS queries to execute in a single pass without materializing intermediate result sets. Shared Rust core ensures FTS and vector indexes are co-located and updated atomically.

vs others: Simpler deployment than Elasticsearch-backed hybrid search because FTS is embedded; faster than Milvus + external FTS because no network round-trips between vector and text search systems.

8

serverRepository47/100

via “full-text search indexing and query execution”

MariaDB server is a community developed fork of MySQL server. Started by core members of the original MySQL team, MariaDB actively works with outside developers to deliver the most featureful, stable, and sanely licensed open SQL server in the industry.

Unique: Implements FTS via auxiliary tables (FTS_*_INDEX_*) that store the inverted index separately from the main table, enabling incremental updates without modifying the main table structure. Supports both boolean and natural language search modes with configurable stop words and minimum word length.

vs others: Simpler than Elasticsearch (no distributed indexing, no real-time updates) but faster for small-to-medium datasets; more integrated than external search engines but less feature-rich

9

qdrantPlatform44/100

via “payload-based filtering with multiple field index types”

Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Unique: Integrates field indexing directly into segment architecture with automatic index type selection based on field cardinality and query patterns, enabling filters to be applied during HNSW traversal rather than post-search, reducing candidates evaluated by 50-90% for selective filters

vs others: More efficient than post-filtering because index-aware pruning happens during graph traversal, whereas alternatives like Elasticsearch require two-phase search (filter then rank) or separate index lookups

10

llm-appTemplate44/100

via “hybrid vector and keyword indexing with efficient similarity search”

Ready-to-run cloud templates for RAG, AI pipelines, and enterprise search with live data. 🐳Docker-friendly.⚡Always in sync with Sharepoint, Google Drive, S3, Kafka, PostgreSQL, real-time data APIs, and more.

Unique: Implements hybrid search through a unified query interface that abstracts over multiple index types, allowing dynamic selection of retrieval strategy (pure vector, pure keyword, or combined) at query time without re-indexing. Supports metadata filtering as a first-class retrieval primitive alongside similarity scoring.

vs others: More flexible than vector-only systems (Pinecone, Weaviate) for exact matching use cases; simpler than building separate keyword and vector pipelines. Pathway's configuration-driven approach enables switching retrieval strategies without code changes.

11

OSS AI agent that indexes and searches the Epstein filesAgent43/100

via “full-text document indexing with semantic embeddings”

Hi HN,I built an open-source AI agent that has already indexed and can search the entire Epstein files, roughly 100M words of publicly released documents.The goal was simple: make a large, messy corpus of PDFs and text files immediately searchable in a precise way, without relying on keyword search

Unique: Combines full-text and semantic search in a single index specifically optimized for investigative document corpora, likely using chunk-aware retrieval that preserves document context and metadata lineage

vs others: More comprehensive than keyword-only search (e.g., Elasticsearch) and faster than pure semantic search because hybrid approach filters with keywords before expensive vector similarity

12

infinityProduct39/100

via “sparse-vector-bm25-full-text-search”

The AI-native database built for LLM applications, providing incredibly fast hybrid search of dense vector, sparse vector, tensor (multi-vector), and full-text.

Unique: Integrates BM25 ranking directly into the database engine alongside vector search, enabling single-query hybrid retrieval without separate Elasticsearch/Solr instances; uses C++20 modules for compile-time inverted index structure optimization.

vs others: More integrated than Elasticsearch + Pinecone stacks because both search types share transaction semantics and metadata; faster than Milvus for text-heavy workloads due to native BM25 implementation vs. plugin-based approaches.

13

ChromaMCP Server38/100

via “full-text search with bm25 ranking”

** - Embeddings, vector search, document storage, and full-text search with the open-source AI application database

Unique: Chroma integrates BM25 search directly into the same collection API as vector search, allowing developers to query both modalities from a single interface without switching between systems or managing separate indices

vs others: More lightweight than Elasticsearch for simple keyword search while maintaining compatibility with semantic search in the same codebase, reducing operational complexity for small-to-medium applications

14

oceanbaseProduct37/100

via “full-text search indexing and query execution”

The Fastest Distributed Database for Transactional, Analytical, and AI Workloads.

Unique: Implements full-text indexing as a native storage engine feature rather than a separate service, allowing full-text predicates to be pushed down into the query optimizer and executed alongside other filters

vs others: Faster than Elasticsearch for small-to-medium datasets because indexes are co-located with data; simpler than Lucene because it integrates directly with SQL

15

CenterPoint ConnectMCP Server36/100

via “search and filter functionality”

Manage properties, companies, employees, invoices, materials, and more from CenterPoint Connect. Search, filter, and update records, generate invoices and purchase orders, log time, and track productions, services, tasks, and warranties. Streamline construction and property operations by automating

Unique: Employs a hybrid indexing system that combines full-text search with structured queries, which is less common in basic record management systems.

vs others: Faster and more flexible than traditional database search methods due to its dual indexing approach.

16

pdf-readerMCP Server35/100

via “keyword search within pdfs”

Read entire PDFs or specific pages on demand. Search documents for keywords and jump to relevant passages. Retrieve metadata to quickly understand document properties.

Unique: Integrates a custom indexing engine that allows for real-time search results as the user types, enhancing user experience over traditional search methods.

vs others: Faster and more responsive than static search implementations because it indexes text dynamically.

17

taladbRepository34/100

via “multi-field full-text search with configurable tokenization”

Local-first document and vector database for React, React Native, and Node.js

Unique: Provides configurable tokenization and field-specific boosting in a local full-text search engine, whereas browser-native search APIs (Ctrl+F) lack relevance ranking and field weighting

vs others: Eliminates Elasticsearch dependency for basic full-text search with simpler API, though with lower performance on very large corpora (>1M documents)

18

@kb-labs/mind-engineFramework34/100

via “semantic search with metadata filtering”

Mind engine adapter for KB Labs Mind (RAG, embeddings, vector store integration).

Unique: Combines vector similarity search with structured metadata filtering through a unified query interface that abstracts backend-specific filter syntax, enabling consistent filtering behavior across different vector stores

vs others: More integrated than manually combining vector search with separate metadata queries because it handles filter translation and result ranking in a single operation

19

@convex-dev/ragRepository34/100

via “metadata filtering and hybrid search (semantic + keyword)”

A rag component for Convex.

Unique: Performs metadata filtering within Convex's query engine before similarity computation, reducing the number of documents to score and enabling efficient combination of structured filtering with semantic ranking in a single database query

vs others: More integrated than Elasticsearch hybrid search (no separate index), but less flexible than Pinecone's metadata filtering for complex boolean queries on high-cardinality fields

20

LLM AppFramework32/100

via “document indexing and full-text search with keyword matching”

Open-source Python library to build real-time LLM-enabled data pipeline.

Unique: Maintains both vector and keyword indices within Pathway's reactive pipeline, enabling hybrid search without separate indexing systems. Index updates propagate reactively when source documents change.

vs others: More efficient than separate vector and keyword search systems because both indices are maintained in one pipeline; more flexible than single-strategy search because it supports multiple retrieval approaches.

Top Matches

Also Known As

Company