Danswer (Onyx)
FrameworkFreeEnterprise AI assistant across company docs.
Capabilities15 decomposed
multi-source document indexing with unified embedding pipeline
Medium confidenceDanswer ingests documents from heterogeneous sources (Slack, Google Drive, Confluence, GitHub, etc.) through connector-based adapters that normalize documents into a unified schema, then processes them through a configurable embedding pipeline (supporting multiple embedding models) and stores vectors in a pluggable vector database backend. The architecture uses a document chunking strategy with metadata preservation to maintain source attribution and access control boundaries across all indexed content.
Uses a connector-adapter pattern where each source (Slack, Confluence, GitHub) has a dedicated connector that normalizes documents into a unified schema before embedding, enabling source-specific metadata preservation and incremental sync without re-embedding the entire corpus. This differs from monolithic indexing approaches that treat all sources identically.
More flexible than Pinecone or Weaviate alone because connectors handle source-specific logic (Slack thread reconstruction, Confluence hierarchy preservation) before embedding, and more maintainable than building custom ETL pipelines for each knowledge source.
semantic search with access control enforcement
Medium confidenceDanswer executes semantic search queries by embedding the user's question, retrieving similar document chunks from the vector database, and filtering results based on the user's document-level access permissions (derived from source system ACLs like Slack workspace membership or Confluence space permissions). The search pipeline ranks results by vector similarity and applies source-specific permission checks before returning chunks to the user, ensuring no unauthorized content leaks.
Enforces source-system ACLs at query time rather than pre-filtering indexed documents, allowing the same document corpus to serve users with different permissions without maintaining separate indices. Permission checks are applied after vector retrieval, reducing the need for complex permission-aware vector queries.
More secure than naive RAG systems that ignore source permissions, and more flexible than pre-filtering documents at index time because it adapts to permission changes without reindexing.
pluggable vector database backend with multi-provider support
Medium confidenceDanswer abstracts the vector database layer through a pluggable backend interface, supporting multiple vector database providers (Postgres with pgvector, Qdrant, Weaviate, Pinecone). The system stores embeddings, document metadata, and chunk information in the chosen backend, and implements a consistent query interface across all backends. Users can switch backends without re-embedding documents if the vector format is compatible.
Implements a consistent query interface across multiple vector database backends (Postgres, Qdrant, Weaviate, Pinecone), allowing users to switch backends without application code changes. The abstraction layer handles backend-specific query syntax and result formatting.
More flexible than single-backend systems because it supports multiple vector databases, and more portable than tightly coupled implementations because switching backends doesn't require re-embedding.
llm provider abstraction with multi-model support
Medium confidenceDanswer abstracts the LLM layer through a provider interface, supporting multiple LLM providers (OpenAI, Anthropic, local models via Ollama/vLLM, Azure OpenAI). Users can configure which LLM to use for chat and answer generation, and can switch providers without changing application code. The system handles provider-specific API formats, token counting, and error handling transparently.
Implements a consistent interface across multiple LLM providers (OpenAI, Anthropic, local models), handling provider-specific API formats and token counting transparently. This allows users to switch LLMs without application code changes.
More flexible than single-provider systems because it supports multiple LLMs, and more cost-effective than always using expensive models because it allows switching to cheaper alternatives.
answer generation with source attribution and citation
Medium confidenceDanswer generates answers to user queries by passing retrieved document chunks to an LLM along with a system prompt that instructs the model to cite sources. The system extracts citations from the LLM response and links them back to the original documents, providing users with verifiable sources for each claim. The citation format is configurable (inline citations, footnotes, etc.) and can be customized per deployment.
Implements citation extraction from LLM responses and links citations back to source documents, providing verifiable sources for each claim. The system uses the LLM's instruction-following capability to enforce citation format rather than post-processing responses.
More verifiable than generic chatbots that don't cite sources, and more transparent than systems that hide source documents because users can immediately verify claims.
user authentication and role-based access control
Medium confidenceDanswer implements user authentication (via OIDC, SAML, or local credentials) and role-based access control (RBAC) to restrict who can access the system and what they can do. Users are assigned roles (admin, user, viewer) that determine their permissions (e.g., admins can manage connectors, users can search and chat, viewers can only read). The system integrates with source system identities (Slack user IDs, Confluence accounts) to enforce document-level access control.
Integrates with source system identities (Slack user IDs, Confluence accounts) to enforce document-level access control, allowing the same document corpus to serve users with different permissions. User identity is mapped across systems to ensure consistent access control.
More secure than systems without authentication, and more flexible than simple role-based systems because it integrates with source system permissions for fine-grained access control.
web interface with search and chat ui
Medium confidenceDanswer provides a web interface (built with React) that allows users to search documents and chat with the AI assistant. The interface includes a search bar for semantic search, a chat panel for multi-turn conversations, and a sidebar showing indexed sources and recent searches. The UI displays search results with source attribution, allows users to click through to source documents, and provides conversation history management.
Provides a unified web interface for both semantic search and conversational chat, allowing users to switch between search and chat modes without context switching. The interface displays source attribution and allows users to navigate to original documents.
More integrated than separate search and chat tools, and more customizable than SaaS solutions because it's open-source and self-hosted.
conversational rag with multi-turn context management
Medium confidenceDanswer implements a conversational chat interface where each user message is embedded and used to retrieve relevant document chunks, which are then passed to an LLM (OpenAI, Anthropic, or local model) along with conversation history to generate contextual responses. The system maintains a conversation thread with full message history, allowing follow-up questions to reference previous context, and implements a sliding-window context strategy to manage token limits while preserving conversation coherence.
Implements conversation threading with explicit context windows where each turn retrieves fresh documents based on the current user message, then augments the LLM prompt with both retrieved chunks and conversation history. This allows the system to handle topic shifts gracefully while maintaining coherence within a conversation thread.
More conversational than stateless RAG systems (like simple vector search), and more document-grounded than generic chatbots because every response is anchored to retrieved source material.
slack integration with workspace-aware permissions
Medium confidenceDanswer provides a Slack bot that indexes Slack messages and threads from specified channels, syncs Slack workspace membership to enforce channel-level access control, and allows users to query indexed Slack content directly from Slack via slash commands or mentions. The integration maintains a mapping between Slack user IDs and channel memberships, ensuring that search results respect channel privacy (users only see messages from channels they're members of).
Implements Slack workspace membership sync as a permission layer, allowing the same message corpus to be searched by different users with different channel access levels. The bot uses Slack's conversation.members API to maintain a real-time mapping of user-to-channel membership, enforcing privacy at query time.
More privacy-aware than generic Slack search tools because it respects channel membership, and more integrated than external search tools because queries happen within Slack without context switching.
confluence connector with space and page-level hierarchy preservation
Medium confidenceDanswer's Confluence connector crawls Confluence spaces and pages, preserving the page hierarchy (parent-child relationships) and space-level access controls. The connector extracts page content, metadata (author, creation date, last modified), and space permissions, then chunks pages while maintaining hierarchy context so that search results can reference the full document path (e.g., 'Space > Parent Page > Child Page'). The connector supports incremental sync to avoid re-indexing unchanged pages.
Preserves Confluence page hierarchy as metadata during chunking, allowing search results to include the full document path and enabling users to navigate back to the original page. The connector uses Confluence's page tree API to reconstruct hierarchy rather than flattening all pages into a single corpus.
More hierarchy-aware than generic document indexers that flatten all pages, and more permission-respecting than simple Confluence search because it enforces space-level access control at query time.
github connector with code and documentation indexing
Medium confidenceDanswer's GitHub connector indexes both code files and documentation (README, wiki pages) from specified repositories, extracting file content, commit history, and branch information. The connector supports filtering by file type (e.g., only index .py and .md files) and can index multiple repositories across organizations. It preserves file paths and repository metadata so that search results can link back to the original file in GitHub.
Indexes both code and documentation from the same repositories, allowing natural language queries to surface relevant code examples alongside documentation. The connector preserves file paths and repository context, enabling users to navigate directly to source files.
More comprehensive than code-only search tools because it includes documentation, and more discoverable than GitHub's native search because it uses semantic similarity rather than keyword matching.
google drive connector with folder hierarchy and shared file support
Medium confidenceDanswer's Google Drive connector indexes files from specified Google Drive folders, supporting both personal and shared drives. The connector extracts file content (from Google Docs, Sheets, PDFs, etc.), preserves folder hierarchy, and syncs sharing permissions to enforce access control. It handles Google Workspace file formats natively and can index files shared with the user's service account.
Syncs Google Drive sharing permissions to enforce access control at query time, allowing the same file corpus to be searched by different users with different sharing levels. The connector uses Google Drive's permissions API to maintain a real-time mapping of user-to-file access.
More permission-aware than generic document indexers, and more integrated than external search tools because it respects Google Drive's native sharing model.
jira connector with issue and comment indexing
Medium confidenceDanswer's Jira connector indexes Jira issues and comments from specified projects, extracting issue content (title, description, comments), metadata (assignee, status, priority, labels), and project-level permissions. The connector supports filtering by issue type or status and can index issues across multiple Jira instances. It preserves issue relationships (parent-child, linked issues) and allows search results to reference the full issue context.
Indexes both issue descriptions and comments, allowing natural language queries to surface relevant issues alongside discussion context. The connector preserves issue metadata (status, priority, assignee) in search results for quick triage.
More discoverable than Jira's native search because it uses semantic similarity, and more context-rich than keyword search because it includes full comment threads.
custom document upload with metadata extraction
Medium confidenceDanswer allows users to upload documents directly (PDF, DOCX, TXT, Markdown) through the web interface or API, automatically extracting text content and metadata (filename, upload date, uploader). The system chunks uploaded documents using configurable strategies and indexes them into the vector database. Uploaded documents can be tagged with custom metadata for filtering and organization.
Provides a simple web interface for document upload without requiring connector setup, making it accessible to non-technical users. Uploaded documents are immediately indexed and searchable without additional configuration.
More user-friendly than connector-based indexing for ad-hoc documents, and more flexible than pre-built connectors because it supports any document type.
configurable chunking strategies with semantic preservation
Medium confidenceDanswer implements multiple document chunking strategies (fixed-size, semantic, recursive) that can be configured per document type. The system supports chunk overlap to preserve context across boundaries, and implements code-aware chunking for programming languages that respects function and class boundaries. Chunking strategies are applied during indexing and can be adjusted without re-indexing if the vector database supports it.
Supports code-aware chunking that respects function and class boundaries, preserving semantic structure in code documents. This differs from naive fixed-size chunking that may split functions or classes across chunks.
More semantically aware than fixed-size chunking, and more flexible than single-strategy systems because it allows per-document-type configuration.
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with Danswer (Onyx), ranked by overlap. Discovered automatically through the match graph.
VpunaAiSearch
** - Connect to [Vpuna AI Search Service](https://aisearch.vpuna.com), a developer first platform for semantic search, summarization, and contextual chat. Each project dynamically exposes its own Remote HTTP MCP server, enabling real-time context injection from structured and unstructured data.
orama
🌌 A complete search engine and RAG pipeline in your browser, server or edge network with support for full-text, vector, and hybrid search in less than 2kb.
all-MiniLM-L12-v2
sentence-similarity model by undefined. 28,25,304 downloads.
@memberjunction/ai-vectordb
MemberJunction: AI Vector Database Module
txtai
💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows
LlamaIndex
Transform enterprise data into powerful LLM applications...
Best For
- ✓Enterprise teams with fragmented knowledge across Slack, Confluence, Google Drive, and GitHub
- ✓Organizations needing document-level access control enforcement during search
- ✓Teams wanting to self-host and control embedding model selection
- ✓Enterprises with strict data governance requiring permission enforcement at query time
- ✓Teams using Danswer across multiple Slack workspaces or Confluence instances with different access levels
- ✓Organizations in regulated industries (healthcare, finance) needing audit trails of who searched what
- ✓Organizations with existing vector database infrastructure they want to reuse
- ✓Teams wanting to self-host all components (Postgres + pgvector)
Known Limitations
- ⚠Connector availability limited to pre-built integrations (Slack, Confluence, GitHub, Google Drive, Jira, etc.) — custom sources require writing new connector code
- ⚠Embedding pipeline is sequential — processing large document volumes (100k+ docs) can take hours depending on chunk size and model
- ⚠Vector database backend must be separately provisioned (Postgres with pgvector, Qdrant, Weaviate) — no embedded option
- ⚠Metadata preservation depends on source connector implementation — some sources may lose nested context
- ⚠Permission enforcement depends on connector-provided ACL data — if a source connector doesn't sync permissions, all documents from that source are treated as accessible to all users
- ⚠Permission checks add latency (~50-200ms per query depending on number of retrieved chunks and permission lookups)
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
About
Open-source enterprise AI assistant that connects to company documents and tools. Danswer provides RAG-powered search and chat across Slack, Google Drive, Confluence, GitHub with access controls.
Categories
Alternatives to Danswer (Onyx)
Are you the builder of Danswer (Onyx)?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →