Fts5 Based Full Text Search Knowledge Base With Bm25 Ranking

1

llama_indexMCP Server55/100

via “hybrid retrieval with bm25 keyword search and semantic reranking”

LlamaIndex is the leading document agent and OCR platform

Unique: Combines vector search, BM25 keyword matching, and optional semantic reranking with configurable fusion algorithms and support for multiple reranker backends. Unlike LangChain's retriever composition (which chains retrievers sequentially), LlamaIndex's hybrid retrieval merges results with configurable fusion.

vs others: Provides integrated hybrid retrieval with automatic result fusion and optional reranking, whereas LangChain requires manual retriever composition and result merging.

2

TurbopufferProduct54/100

via “bm25 full-text search with metadata filtering”

Low-cost vector database — pay-per-query, S3-backed, up to 10x cheaper at scale.

Unique: Integrates BM25 full-text search as a first-class capability alongside vector search within the same API, enabling hybrid search queries that combine both ranking signals without requiring separate search infrastructure or post-processing to merge results

vs others: Simpler than maintaining separate Elasticsearch/Meilisearch instances for keyword search because full-text and vector search are unified in a single API with shared namespace isolation and S3 storage

3

RediSearchMCP Server53/100

via “scoring and ranking with bm25 and custom weights”

A query and indexing engine for Redis, providing secondary indexing, full-text search, vector similarity search and aggregations.

Unique: Implements BM25 scoring with field-level weights specified at index creation, enabling domain-specific relevance tuning without custom scoring logic; integrates scoring into query execution to compute scores during result collection rather than post-processing

vs others: More efficient than Elasticsearch's custom scoring because BM25 is computed in-process without script execution; simpler than learning Elasticsearch's scoring DSL because field weights are declarative

4

WeKnoraRepository51/100

via “hybrid retrieval with semantic and keyword search fusion”

Open-source LLM knowledge platform: turn raw documents into a queryable RAG, an autonomous reasoning agent, and a self-maintaining Wiki.

Unique: Decouples semantic and keyword retrieval into independent pipelines with pluggable reranking, allowing fine-grained control over fusion strategy per knowledge base. Supports multiple reranking backends (BM25, cross-encoder models) without requiring model retraining.

vs others: More flexible than pure semantic search (handles domain jargon better) and more intelligent than keyword-only search (understands intent), with configurable reranking that adapts to domain-specific precision/recall tradeoffs.

5

all-MiniLM-L6-v2Model50/100

via “semantic-text-search-with-ranking”

feature-extraction model by undefined. 32,39,437 downloads.

Unique: Combines embedding-based retrieval with similarity ranking to enable semantic search without keyword matching — the distilled BERT model is optimized for semantic similarity, making search results more relevant than BM25 for intent-based queries

vs others: More accurate than BM25 keyword search for semantic relevance; faster than cross-encoder reranking because it uses pre-computed embeddings; simpler than learning-to-rank approaches because it requires no training data

6

context-modeMCP Server49/100

via “fts5-full-text-search-knowledge-base-with-bm25-ranking”

Context window optimization for AI coding agents. Sandboxes tool output, 98% reduction. 14 platforms

Unique: Uses SQLite FTS5 with BM25 ranking for local, persistent full-text search over code and tool output. Integrates with session continuity to partition knowledge by session, enabling multi-session knowledge reuse without context pollution. Achieves 99% reduction in retrieved data size through snippet truncation.

vs others: Faster and more context-efficient than vector-based RAG (no embedding API calls, no semantic similarity overhead) for lexical code search, and avoids external dependencies (Elasticsearch, Pinecone) by using embedded SQLite.

7

FastGPTPlatform49/100

via “rag-based knowledge base retrieval with semantic search and hybrid ranking”

FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive s

Unique: Combines semantic search with BM25 keyword matching and optional re-ranking in a single retrieval pipeline, with automatic chunk management and hierarchical dataset organization. Integrates directly into workflow nodes for seamless context injection into LLM prompts.

vs others: More integrated than standalone RAG libraries (LangChain, LlamaIndex) because retrieval is a first-class workflow node with built-in chunk management, re-ranking, and source attribution rather than a library you compose yourself.

8

pg-aiguideMCP Server48/100

via “keyword-bm25-postgres-documentation-search”

MCP server and Claude plugin for Postgres skills and documentation. Helps AI coding tools generate better PostgreSQL code.

Unique: Leverages PostgreSQL's native pg_tsvector and BM25 ranking algorithm for keyword search, eliminating dependency on external search services or embedding APIs. Integrates seamlessly with the same documentation corpus as semantic search, allowing hybrid search strategies. BM25 ranking is computed in-database, avoiding network latency.

vs others: Faster and cheaper than semantic search for exact feature name queries because it uses native PostgreSQL full-text search without embedding API calls; more precise than semantic search when terminology is known, because BM25 rewards exact term matches.

9

lancedbRepository47/100

via “full-text-search-with-bm25-ranking”

Developer-friendly OSS embedded retrieval library for multimodal AI. Search More; Manage Less.

Unique: Integrates BM25 full-text search directly into the Lance storage layer rather than as a separate index type, allowing hybrid vector+FTS queries to execute in a single pass without materializing intermediate result sets. Shared Rust core ensures FTS and vector indexes are co-located and updated atomically.

vs others: Simpler deployment than Elasticsearch-backed hybrid search because FTS is embedded; faster than Milvus + external FTS because no network round-trips between vector and text search systems.

10

weaviatePlatform43/100

via “hybrid search combining vector similarity with bm25 keyword ranking and structured filtering”

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.

Unique: Uses delta-merger pattern (inverted/delta_merger.go) for incremental BM25 index updates, avoiding full index rebuilds on each write. Implements Traverser/Explorer query execution pattern that parallelizes vector and keyword index lookups, then applies structured filtering on merged candidates rather than sequentially.

vs others: More efficient than Elasticsearch for vector+keyword fusion because it avoids separate vector plugin overhead; better than Pinecone's metadata filtering because BM25 integration is native rather than post-hoc filtering.

11

infinityProduct39/100

via “sparse-vector-bm25-full-text-search”

The AI-native database built for LLM applications, providing incredibly fast hybrid search of dense vector, sparse vector, tensor (multi-vector), and full-text.

Unique: Integrates BM25 ranking directly into the database engine alongside vector search, enabling single-query hybrid retrieval without separate Elasticsearch/Solr instances; uses C++20 modules for compile-time inverted index structure optimization.

vs others: More integrated than Elasticsearch + Pinecone stacks because both search types share transaction semantics and metadata; faster than Milvus for text-heavy workloads due to native BM25 implementation vs. plugin-based approaches.

12

vectraRepository37/100

via “bm25 full-text search with hybrid ranking”

A lightweight, file-backed vector database for Node.js and browsers with Pinecone-compatible filtering and hybrid BM25 search.

Unique: Combines BM25 and vector similarity in a single ranking framework with configurable weighting, avoiding the need for separate lexical and semantic search pipelines. Implements BM25 from scratch rather than wrapping an external library.

vs others: Simpler than Elasticsearch for hybrid search but lacks advanced features like phrase queries, stemming, and distributed indexing. Better integrated with vector search than bolting BM25 onto a pure vector database.

13

onyxProduct37/100

via “semantic search with hybrid bm25 and embedding-based ranking”

Open Source AI Platform - AI Chat with advanced features that works with every LLM

Unique: Combines Vespa's native BM25 ranking with semantic similarity scoring in a single query, with configurable weighting and optional LLM-based re-ranking. Supports per-assistant search strategy configuration without re-indexing, enabling teams to optimize for precision vs. recall per use case.

vs others: More accurate than BM25-only search because it captures semantic meaning; more efficient than pure semantic search because BM25 filtering reduces embedding computation overhead. More flexible than fixed-weight hybrid search because weights are configurable per-assistant.

14

context-modeProduct36/100

via “fts5-based full-text search knowledge base with bm25 ranking”

Context window optimization for AI coding agents. Sandboxes tool output, 98% reduction. 14 platforms

Unique: Implements SQLite FTS5 with BM25 ranking as a lightweight, persistent knowledge base that survives session resets and context compaction. Unlike vector-based RAG systems, it requires no embedding model or external vector database, making it zero-dependency and suitable for offline-first agents.

vs others: Faster and simpler than vector RAG for keyword-heavy queries (code search, API docs) because it avoids embedding latency, and persists across sessions without external state management, but lacks semantic understanding compared to embedding-based retrieval.

15

oceanbaseProduct36/100

via “full-text search indexing and query execution”

The Fastest Distributed Database for Transactional, Analytical, and AI Workloads.

Unique: Implements full-text indexing as a native storage engine feature rather than a separate service, allowing full-text predicates to be pushed down into the query optimizer and executed alongside other filters

vs others: Faster than Elasticsearch for small-to-medium datasets because indexes are co-located with data; simpler than Lucene because it integrates directly with SQL

16

ChromaMCP Server32/100

via “full-text search with bm25 ranking”

** - Embeddings, vector search, document storage, and full-text search with the open-source AI application database

Unique: Chroma integrates BM25 search directly into the same collection API as vector search, allowing developers to query both modalities from a single interface without switching between systems or managing separate indices

vs others: More lightweight than Elasticsearch for simple keyword search while maintaining compatibility with semantic search in the same codebase, reducing operational complexity for small-to-medium applications

17

MCPProxyMCP Server32/100

via “bm25-based intelligent tool discovery across federated mcp servers”

** - Open-source local app that enables access to multiple MCP servers and thousands of tools with intelligent discovery via MCP protocol, runs servers in isolated environments, and features automatic quarantine protection against malicious tools.

Unique: Uses Bleve-based BM25 indexing with on-demand tool discovery rather than static schema loading, achieving 99% token reduction. Implements lazy tool loading pattern where agents request tools by search query instead of receiving full catalog upfront.

vs others: Reduces token overhead by 99% compared to loading all tool schemas directly, and outperforms naive filtering by using relevance ranking instead of simple string matching.

18

alcoveMCP Server31/100

via “bm25 ranked document retrieval”

MCP server that gives AI coding agents on-demand access to private project docs. BM25 ranked search, multi-project support, one setup for any MCP-compatible agent (Claude Code, Cursor, Codex, Gemini CLI, and more).

Unique: Utilizes the BM25 algorithm specifically optimized for private documentation retrieval, enhancing relevance scoring over traditional keyword searches.

vs others: More efficient than standard keyword search engines for project documentation due to its relevance-focused scoring.

19

MeilisearchMCP Server28/100

via “hybrid search combining full-text and semantic ranking”

** - Interact & query with Meilisearch (Full-text & semantic search API)

Unique: Orchestrates parallel full-text and semantic search execution through MCP, with configurable fusion algorithms that blend BM25 and vector similarity scores. Abstracts ranking complexity from agents while exposing tuning parameters.

vs others: More flexible than Elasticsearch's hybrid search (which requires custom scoring scripts), simpler than implementing custom fusion logic, and faster than sequential full-text-then-semantic search due to parallel execution

20

AgentsetRepository28/100

via “semantic-search-with-hybrid-reranking”

An open-source platform for building and evaluating RAG and agentic applications. [#opensource](https://github.com/agentset-ai/agentset)

Unique: Combines vector search with BM25 keyword matching and applies reranking in a single pipeline, rather than treating semantic and keyword search as separate paths. Supports multimodal retrieval (images, tables, graphs) alongside text, enabling cross-format document understanding.

vs others: Outperforms pure vector search (Pinecone alone) and pure keyword search (Elasticsearch) by combining both with learned reranking, achieving higher precision on hybrid queries; faster than building custom hybrid pipelines because reranking is built-in.

Top Matches

Also Known As

Company