codebase-memory-mcp
MCP ServerFreeHigh-performance code intelligence MCP server. Indexes codebases into a persistent knowledge graph — average repo in milliseconds. 66 languages, sub-ms queries, 99% fewer tokens. Single static binary, zero dependencies.
Capabilities15 decomposed
multi-language ast parsing and entity extraction with tree-sitter
Medium confidenceParses source code in 66 languages using tree-sitter grammar bindings (vendored C components) to extract structural entities: function/method definitions, class hierarchies, variable declarations, imports, and type annotations. The parsing engine operates as the first pass in a 7-pass indexing pipeline, converting raw source text into an intermediate AST representation that feeds downstream semantic analysis. Uses tree-sitter's incremental parsing to avoid re-parsing unchanged file regions during incremental reindexing.
Uses vendored tree-sitter C bindings compiled into a single static binary, enabling 66-language support without external dependencies or grammar downloads. Integrates incremental parsing to avoid re-parsing unchanged regions during content-hash-based reindexing, achieving ~4× faster incremental updates than full-scan approaches.
Supports 66 languages in a single binary with zero external dependencies, whereas LSP-based approaches require per-language server installations and Regex-based tools are limited to 5-10 languages with poor structural accuracy.
persistent sqlite knowledge graph with cypher query engine
Medium confidenceBuilds and maintains a queryable knowledge graph stored in SQLite WAL mode at ~/.cache/codebase-memory-mcp/codebase-memory.db. The graph schema models code entities (functions, classes, modules) as nodes and relationships (calls, inheritance, imports, type references) as edges. Exposes a Cypher query engine (src/store/store.c) for graph traversal, enabling sub-millisecond queries for structural patterns like 'find all callers of function X' or 'trace inheritance chain for class Y'. Supports incremental updates via content-hash-based change detection — only modified files trigger re-parsing and graph updates.
Implements a Cypher query engine in C within a single static binary, achieving sub-millisecond query latency on graphs with thousands of nodes. Uses content-hash-based incremental indexing to detect file changes and update only affected graph regions, enabling ~4× faster re-indexing than full-scan approaches. Stores graph in SQLite WAL mode for ACID compliance and concurrent read access.
Delivers sub-millisecond Cypher queries on local graphs without network latency, whereas cloud-based code intelligence services (GitHub Copilot, Tabnine) incur 100-500ms round-trip latency and require sending code to external servers.
community detection and architectural clustering
Medium confidencePerforms community detection on the code graph to identify clusters of related entities (functions, classes, modules) that form logical architectural components. The indexing pipeline (Pass 6) uses graph clustering algorithms to group entities based on call frequency, shared dependencies, and module boundaries. Results are stored in the graph as 'BELONGS_TO_COMMUNITY' relationships, queryable via tools like 'find_communities' and 'find_community_members'. Useful for understanding codebase architecture, identifying tightly coupled components, and visualizing system structure.
Uses graph clustering algorithms on the call graph to automatically identify architectural components without manual configuration or domain knowledge. Results are stored in the graph for efficient querying and visualization.
Automatic community detection requires no manual configuration or domain knowledge, whereas manual architecture documentation is often outdated. Faster and more objective than manual architectural analysis.
test coverage mapping and test-to-code linking
Medium confidenceIdentifies test functions and links them to the code they test by analyzing test file naming conventions, test decorators, and assertion patterns. The indexing pipeline (Pass 7) detects test functions (e.g., functions starting with 'test_', methods in classes ending with 'Test', functions decorated with @test or @pytest.mark) and attempts to link them to the functions they test based on naming patterns and call graph analysis. Results are stored in the graph as 'TESTS' relationships, queryable via tools like 'find_tests_for_function' and 'find_tested_functions'.
Automatically links test functions to code under test using naming patterns and call graph analysis, without requiring explicit test annotations or coverage instrumentation. Works across multiple testing frameworks (pytest, unittest, Jest, Go testing, etc.) in a single indexing pass.
Automatic test linking requires no instrumentation or coverage tools, whereas coverage tools (pytest-cov, Istanbul) require test execution and only measure line coverage. Faster than manual test discovery and works for untested code.
file content access and code snippet retrieval
Medium confidenceProvides direct access to source code files and code snippets via tools like 'get_file_content' and 'get_code_snippet'. Supports retrieving entire files or specific line ranges, with optional syntax highlighting and context expansion. Useful for AI agents that need to read actual code after identifying relevant functions via graph queries. Integrates with graph queries to provide seamless navigation from structural queries (find_callers) to actual code inspection.
Provides direct file access integrated with graph queries, enabling seamless navigation from structural queries (find_callers) to actual code inspection. Supports line-range retrieval and context expansion for efficient code reading.
Integrated file access eliminates separate file reading steps and enables efficient context expansion, whereas separate file reading tools require manual path construction and context management.
configuration file and dependency link detection
Medium confidenceDetects references to configuration files, environment variables, and external dependencies by analyzing code patterns, imports, and config file references. The indexing pipeline (Pass 5) identifies config file paths (e.g., 'config.yaml', 'settings.json'), environment variable references (e.g., 'os.getenv("DATABASE_URL")'), and external dependencies (e.g., 'import requests', 'require("express")') and links them to the code that references them. Results are stored in the graph as 'REFERENCES_CONFIG', 'USES_ENV_VAR', and 'DEPENDS_ON' relationships.
Automatically detects configuration file, environment variable, and dependency references using pattern matching and AST analysis, linking them to code locations in the graph. Works across multiple languages and frameworks without requiring explicit annotations.
Automatic detection of config and dependency references requires no manual configuration, whereas dependency analysis tools (npm audit, pip-audit) only check for known vulnerabilities and don't link to code locations. Faster than manual dependency tracking.
polyglot codebase indexing with language-specific semantics
Medium confidenceIndexes codebases containing multiple programming languages (Python, Go, TypeScript, Rust, Java, C++, C#, Kotlin, Lua, Haskell, OCaml, Swift, Dart, MATLAB, Lean 4, Wolfram, and 48 more) in a single unified indexing pass. Each language is parsed using language-specific tree-sitter grammars, and semantic analysis (call resolution, type inference, HTTP route detection) is adapted to each language's semantics. Results are stored in a unified graph that enables cross-language queries (e.g., 'find all Python functions that call Go functions').
Indexes 66 languages in a single unified graph with language-specific semantic analysis, enabling cross-language queries without separate per-language tools. Each language's semantics (Python type hints, Go explicit types, TypeScript annotations) are respected in a unified indexing pipeline.
Single unified indexing pass for 66 languages eliminates the need for per-language tool setup, whereas LSP-based approaches require separate server configuration for each language. Cross-language queries are impossible with language-specific tools.
7-pass semantic indexing pipeline with call resolution and type inference
Medium confidenceExecutes a multi-stage indexing pipeline (src/pipeline/pipeline.c) that progressively enriches the graph: Pass 1 extracts structure (definitions, imports), Pass 2 resolves calls to their definitions, Pass 3 infers types and inheritance, Pass 4 detects HTTP links and routes, Pass 5 identifies config file references, Pass 6 performs community detection (clustering related entities), Pass 7 indexes test coverage. Each pass operates on the graph built by previous passes, enabling sophisticated analyses like 'find all functions that handle HTTP POST requests' or 'identify dead code by tracing reachability from entry points'. Type inference uses language-specific heuristics (e.g., Python type hints, Go explicit types, TypeScript annotations) to build a best-effort type map.
Implements a 7-pass pipeline that progressively enriches the graph with semantic information (calls, types, HTTP routes, communities, tests) in a single indexing run. Each pass operates on the graph state from previous passes, enabling sophisticated cross-cutting analyses without re-parsing. Uses language-specific heuristics for call resolution and type inference, adapting to each language's semantics (Python type hints, Go explicit types, TypeScript annotations).
Provides call resolution and type inference in a single indexing pass without requiring LSP servers or language-specific analysis tools, whereas LSP-based approaches require per-language server setup and multiple round-trips for semantic information.
incremental reindexing with content-hash change detection
Medium confidenceDetects file changes using content hashing (comparing SHA-256 hashes of file contents) and re-parses only modified files during incremental reindexing. The file watcher (src/pipeline/pipeline_incremental.c) polls the filesystem with adaptive intervals (5-60 seconds) and triggers incremental re-indexing when changes are detected. This approach avoids full-codebase re-parsing on every change, achieving ~4× faster reindexing than scanning all files. The graph is updated in-place, preserving query performance for unchanged portions of the codebase.
Uses content-hash-based change detection (SHA-256 comparison) instead of filesystem watchers or timestamps, enabling reliable detection of actual code changes without false positives from build artifacts or temporary files. Adaptive polling intervals (5-60s) balance freshness with CPU overhead. Achieves ~4× faster reindexing than full-scan approaches by re-parsing only modified files.
Content-hash detection is more reliable than filesystem timestamps (which can be unreliable across network mounts) and more efficient than full-codebase re-parsing, whereas LSP-based approaches require per-language server integration and may miss cross-language dependencies.
mcp tool exposure with stdio transport and cli fallback
Medium confidenceExposes 14 code intelligence tools via the Model Context Protocol (MCP) specification, communicating with MCP clients (Claude Code, Cursor, Windsurf, Gemini CLI, VS Code, Zed) over stdio transport. The MCP server (src/mcp/mcp.c) implements a single-threaded event loop that parses incoming JSON-RPC requests, routes them to tool handlers, and returns structured results. Each tool maps to a specific graph query or code access operation (e.g., 'find_callers', 'find_callees', 'get_file_content'). Additionally, exposes all tools via CLI mode (codebase-memory-mcp cli <tool_name>) for scripting and testing without an MCP client.
Implements MCP server in C with a single-threaded event loop using yyjson for fast JSON parsing, enabling low-latency tool calls from MCP clients. Dual-mode exposure (MCP + CLI) allows integration with AI agents and scripting without requiring separate adapters. Single static binary with zero dependencies simplifies deployment to any MCP-compatible client.
Native MCP integration eliminates the need for custom plugins or adapters, whereas REST API approaches require additional HTTP server infrastructure and introduce network latency. CLI mode enables scripting without MCP client setup, whereas LSP-based approaches require language-specific server configuration.
graph visualization and interactive exploration ui
Medium confidenceProvides a web-based graph visualization UI (docs/index.html) that renders the indexed knowledge graph as an interactive node-link diagram. Users can click nodes to expand relationships, search for entities by name, and explore call graphs, inheritance hierarchies, and dependency chains visually. The UI queries the SQLite graph via the MCP tools, enabling real-time exploration without re-indexing. Useful for understanding codebase structure, identifying architectural patterns, and communicating code organization to team members.
Provides a lightweight web-based graph visualization that queries the local SQLite graph via MCP tools, enabling interactive exploration without external services or graph databases. Renders call graphs, inheritance hierarchies, and dependency chains in a single unified interface.
Local graph visualization eliminates dependency on cloud-based visualization services (which require uploading code) and provides instant rendering without network latency, whereas GitHub's dependency graph requires cloud hosting and Graphviz-based tools require manual graph generation.
project-level code intelligence queries (find_callers, find_callees, trace_calls)
Medium confidenceExposes graph query tools that answer project-level structural questions by traversing the knowledge graph. 'find_callers' returns all functions/methods that call a given function; 'find_callees' returns all functions called by a given function; 'trace_calls' returns the full call path from a source function to a target function. These queries operate on the pre-built graph, returning results in milliseconds without re-parsing. Supports filtering by language, module, or file path to narrow results in large codebases.
Executes call graph queries on the pre-built SQLite graph in sub-millisecond time, returning precise structural results without re-parsing or file I/O. Supports filtering by language, module, and file path to narrow results in polyglot codebases. Handles name collisions by returning all matching functions with their locations.
Sub-millisecond call graph queries are 100-1000× faster than grep-based approaches and more accurate than LSP-based tools which require per-language server setup. Handles polyglot codebases in a single query, whereas language-specific tools require multiple queries.
http route and handler mapping with pattern detection
Medium confidenceDetects HTTP routes and their handler functions by analyzing route definitions (decorators, function calls, config files) across frameworks (Express, FastAPI, Django, Spring, etc.). The indexing pipeline (Pass 4) uses pattern matching and AST analysis to identify route declarations (e.g., @app.route('/api/users'), router.get('/users/:id')) and link them to their handler functions. Supports extracting route parameters, HTTP methods, and middleware chains. Results are stored in the graph and queryable via tools like 'find_routes' and 'find_route_handlers'.
Uses AST analysis and pattern matching to detect HTTP routes across multiple frameworks (Express, FastAPI, Django, Spring, etc.) in a single indexing pass, without requiring framework-specific plugins. Links routes to handler functions in the graph, enabling queries like 'find all handlers for POST /api/users' without manual mapping.
Framework-agnostic route detection works across polyglot codebases without per-framework setup, whereas framework-specific tools (Swagger/OpenAPI generators) require explicit annotations and don't work for undocumented APIs. Faster than manual API exploration or grep-based searching.
type hierarchy and inheritance chain resolution
Medium confidenceResolves type relationships and inheritance hierarchies by analyzing class definitions, interface implementations, and type annotations. The indexing pipeline (Pass 3) uses language-specific heuristics to identify parent classes, implemented interfaces, and type parameters. Results are stored in the graph as 'INHERITS_FROM' and 'IMPLEMENTS' relationships, queryable via tools like 'find_subclasses', 'find_implementations', and 'trace_inheritance'. Supports both single and multiple inheritance, generic types, and mixins (where applicable).
Resolves type hierarchies across 66 languages using language-specific heuristics (Python type hints, Go explicit types, TypeScript annotations, Java class declarations), storing results in the graph for sub-millisecond queries. Handles single and multiple inheritance, generic types, and mixins without requiring external type checkers.
Language-agnostic type hierarchy resolution works across polyglot codebases in a single query, whereas LSP-based approaches require per-language server setup and TypeScript-specific tools only work for TypeScript. Faster than manual inheritance chain exploration.
dead code and reachability analysis
Medium confidenceIdentifies unreachable code by performing reachability analysis from entry points (main functions, exported APIs, test entry points). The indexing pipeline uses graph traversal to mark all functions reachable from entry points, identifying functions with no incoming edges as potentially dead code. Supports filtering by module, file path, or function type (private vs. public) to reduce false positives. Results are queryable via tools like 'find_dead_code' and 'find_unreachable_functions'.
Performs reachability analysis on the pre-built call graph by traversing from identified entry points, marking all reachable functions and identifying unreachable ones. Supports filtering by module and function type to reduce false positives from exported APIs and test fixtures.
Graph-based reachability analysis is more accurate than regex-based dead code detection and faster than manual code review. Handles polyglot codebases in a single analysis, whereas language-specific linters require per-language setup.
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with codebase-memory-mcp, ranked by overlap. Discovered automatically through the match graph.
code-review-graph
Local knowledge graph for Claude Code. Builds a persistent map of your codebase so Claude reads only what matters — 6.8× fewer tokens on reviews and up to 49× on daily coding tasks.
CodeGraphContext
An MCP server plus a CLI tool that indexes local code into a graph database to provide context to AI assistants.
Scaffold
** - Scaffold is a Retrieval-Augmented Generation (RAG) system designed to structural understanding of large codebases. It transforms your source code into a living knowledge graph, allowing for precise, context-aware interactions that go far beyond simple file retrieval.
code-index-mcp
A Model Context Protocol (MCP) server that helps large language models index, search, and analyze code repositories with minimal setup
Repo Map
** -🐧 🪟 🍎 - An MCP server (and command-line tool) to provide a dynamic map of chat-related files from the repository with their function prototypes and related files in order of relevance. Based on the "Repo Map" functionality in Aider.chat
Sourcerer
** - MCP for semantic code search & navigation that reduces token waste
Best For
- ✓AI agents analyzing polyglot codebases (Python + Go + TypeScript stacks)
- ✓Teams using Claude Code, Cursor, or Windsurf that need language-agnostic code intelligence
- ✓Developers building code analysis tools that require structural understanding across 66+ languages
- ✓AI agents that need repeated structural queries on the same codebase
- ✓Teams using Claude Code or Cursor that want sub-millisecond response times for code intelligence
- ✓Developers building impact analysis tools (what breaks if I change this function?)
- ✓Architects and tech leads understanding codebase organization
- ✓Teams performing large-scale refactoring and needing to understand component boundaries
Known Limitations
- ⚠Tree-sitter grammars may have edge cases with non-standard or legacy syntax variants
- ⚠Parsing latency scales with file size; very large files (>100KB) may add milliseconds per file
- ⚠Type inference is best-effort and language-dependent — dynamically typed languages (Python, JavaScript) have less precise type information than statically typed ones
- ⚠Graph schema is optimized for structural queries; semantic queries (what does this code do?) still require LLM analysis
- ⚠SQLite WAL mode adds ~5-10MB overhead per indexed codebase; very large graphs (>1GB) may experience slower traversal
- ⚠Cypher query engine is read-only; no mutation support for dynamic graph updates during agent execution
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
Repository Details
Last commit: Apr 18, 2026
About
High-performance code intelligence MCP server. Indexes codebases into a persistent knowledge graph — average repo in milliseconds. 66 languages, sub-ms queries, 99% fewer tokens. Single static binary, zero dependencies.
Categories
Alternatives to codebase-memory-mcp
Are you the builder of codebase-memory-mcp?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →