Rebuff
FrameworkFreeSelf-hardening prompt injection detector with multi-layer defense.
Capabilities12 decomposed
multi-layered heuristic prompt injection detection
Medium confidenceAnalyzes incoming prompts using fast, pattern-based rules to detect common prompt injection attack signatures (keywords, structural patterns, encoding tricks). Operates as the first defense layer before LLM-based detection, using configurable keyword lists and regex-based pattern matching to identify malicious intent without requiring model inference. Returns a heuristic score that can be compared against a configurable threshold to block suspicious inputs.
Implements defense-in-depth as first layer with configurable keyword and pattern registries, allowing teams to customize detection rules without retraining models. Uses strategy pattern to enable/disable heuristic tactics independently from other detection layers.
Faster than LLM-only detection (no inference latency) and more transparent than black-box ML approaches, but less semantically sophisticated than LLM-based detection alone
llm-based semantic prompt injection detection
Medium confidenceDelegates prompt injection detection to a dedicated language model that analyzes user input semantically to identify malicious intent, jailbreak attempts, and instruction-override attacks. The SDK abstracts the LLM backend (OpenAI, Anthropic, local models via Ollama) and returns a detection score based on the model's confidence in identifying an attack. This layer captures sophisticated, context-aware attacks that simple heuristics miss.
Abstracts LLM provider selection via strategy pattern, supporting OpenAI, Anthropic, and local Ollama models with unified interface. Configurable thresholds per provider allow tuning sensitivity based on model capabilities and false-positive tolerance.
More semantically accurate than heuristics but slower; unlike static rule-based systems, adapts to new attack patterns without code changes, though still vulnerable to adversarial prompts targeting the detection model itself
incident logging and attack pattern learning loop
Medium confidenceProvides APIs to log detected attacks (especially canary token leaks) to the vector database, enabling the system to learn from incidents and improve future detection. When isCanaryWordLeaked() detects a leak, the application can call logAttack() to store the attack input and metadata, which gets embedded and added to the vector database. This creates a feedback loop where each incident improves detection of similar future attacks.
Implements closed-loop learning: detected attacks (especially canary token leaks) are automatically logged to vector database, improving future detection without manual curation. Metadata logging enables forensic analysis and trend tracking.
Enables continuous improvement of detection over time, unlike static rule-based or pre-trained model approaches; requires operational discipline to sanitize sensitive data before logging
per-tactic detection scoring and explainability
Medium confidenceReturns detailed detection results that include individual scores from each enabled tactic (heuristic score, LLM confidence, vector similarity score) alongside the final detection decision. This enables developers to understand which tactic flagged an input and why, supporting debugging, threshold tuning, and explainability to stakeholders. Detection results include metadata like matched attack patterns from vector database or heuristic rules triggered.
Returns granular per-tactic scores and metadata (matched attack patterns, heuristic rules triggered) enabling developers to understand detection decisions at multiple levels of detail. Supports both high-level flagged boolean and detailed scoring for debugging.
More transparent than black-box detection systems; enables threshold tuning and debugging unavailable in opaque approaches, though requires application-level handling of detailed results
vector similarity matching against known attack patterns
Medium confidenceStores embeddings of previously detected or known prompt injection attacks in a vector database (Pinecone, Supabase, or custom backends), then compares incoming prompts against this corpus using semantic similarity. When a user input's embedding exceeds a similarity threshold to known attacks, the system flags it as a potential injection. This layer learns from past incidents and enables zero-shot detection of attack variants.
Implements pluggable vector database backends (Pinecone, Supabase, custom) via abstraction layer, enabling teams to choose storage based on compliance, latency, and cost requirements. Stores attack metadata alongside embeddings for incident correlation and forensics.
Learns from organizational incident history without retraining, unlike static heuristics; more scalable than maintaining curated rule lists, but requires active management of attack corpus and periodic re-embedding as threat landscape evolves
canary token injection and leak detection
Medium confidenceInserts randomly generated, unique canary tokens into system prompts before sending to the LLM, then monitors the model's response to detect if those tokens appear in the output. If a canary token leaks, it indicates the model has exposed its system instructions, revealing a successful prompt injection. The SDK provides addCanaryWord() to inject tokens and isCanaryWordLeaked() to check responses, enabling post-hoc detection of instruction leakage.
Generates cryptographically random, unique canary tokens per request and provides explicit APIs (addCanaryWord, isCanaryWordLeaked) for application-level integration. Enables closed-loop learning: detected leaks can be automatically logged to vector database to improve future detection.
Detects successful attacks that bypass all preventive layers; unlike purely preventive approaches, provides forensic evidence of instruction exposure and enables continuous improvement through incident-driven learning
configurable multi-tactic detection strategy with threshold tuning
Medium confidenceImplements strategy pattern to compose heuristic, LLM-based, and vector database detection tactics into a unified detection pipeline. Each tactic has an independent, configurable threshold that determines sensitivity. The SDK allows enabling/disabling tactics, adjusting thresholds per tactic, and combining scores across tactics to make a final detection decision. This architecture enables teams to tune detection sensitivity for their specific risk tolerance and false-positive budget.
Uses strategy pattern to decouple detection tactics from orchestration logic, enabling runtime composition and threshold tuning without code changes. Each tactic is independently testable and can be swapped for custom implementations.
More flexible than single-method detection (heuristics-only or LLM-only); allows cost-latency-accuracy tradeoffs unavailable in monolithic approaches, though requires operational discipline to tune thresholds correctly
python sdk with synchronous and asynchronous detection apis
Medium confidenceProvides Python bindings for Rebuff detection with both sync (detect_injection) and async (async detect_injection) methods, enabling integration into synchronous Flask/Django applications and async FastAPI/Starlette services. The SDK abstracts backend configuration (LLM provider, vector database, heuristic rules) via environment variables or constructor parameters, reducing boilerplate and enabling environment-specific configuration.
Provides both sync and async APIs with unified interface, enabling drop-in integration into existing Python frameworks. Configuration abstraction via environment variables and constructor parameters allows same code to run across dev/staging/prod with different backends.
More Pythonic than REST API calls; async support enables non-blocking detection in high-throughput services, unlike synchronous-only SDKs
javascript/typescript sdk with browser and node.js support
Medium confidenceProvides TypeScript-first SDK for JavaScript environments (Node.js, Deno, browsers) with full type safety and ESM/CommonJS module support. Implements the same multi-tactic detection strategy as Python SDK but optimized for JavaScript async/await patterns. Includes built-in support for configuring LLM providers and vector databases via constructor options or environment variables.
Provides TypeScript-first API with full type definitions for all detection results and configuration objects. Supports both Node.js and browser environments with appropriate backend selection (heuristics-only in browser, full tactics in Node.js).
Type-safe alternative to REST API calls; browser support enables client-side validation without backend round-trips, though limited to heuristics and vector search in browser context
pluggable vector database backend abstraction
Medium confidenceAbstracts vector database implementation behind a unified interface, supporting Pinecone, Supabase, Weaviate, Milvus, and custom backends. The SDK accepts a vector database configuration object at initialization and delegates all embedding storage/retrieval to the chosen backend. This enables teams to switch vector databases without code changes and implement custom backends for compliance or performance requirements.
Implements backend abstraction via interface-based design, allowing teams to implement custom vector database backends by conforming to a simple contract. Supports major vendors (Pinecone, Supabase, Weaviate) out-of-the-box with minimal configuration.
More flexible than vendor lock-in to a single vector database; enables cost optimization and compliance-driven backend selection without application code changes
interactive web playground for detection testing and tuning
Medium confidenceProvides a web-based UI (hosted at rebuff.ai or self-hosted) where developers can test prompt injection detection in real-time, adjust tactic thresholds, and visualize per-tactic detection scores. The playground connects to a backend API that runs the full detection pipeline and returns detailed results, enabling rapid iteration on threshold tuning and detection strategy without writing code.
Provides real-time visualization of per-tactic detection scores with interactive threshold adjustment, enabling non-developers to understand and tune detection behavior. Playground API abstracts backend complexity, allowing teams to test detection without SDK integration.
More accessible than CLI-based testing; enables rapid iteration on threshold tuning without code deployment, though less suitable for production-scale testing than programmatic APIs
self-hosted deployment with environment-based configuration
Medium confidenceEnables self-hosting of Rebuff server (detection backend and playground UI) via Docker, Kubernetes, or direct binary deployment. Configuration is entirely environment-variable-driven (LLM provider, vector database, heuristic rules), enabling teams to deploy to private infrastructure without code changes. Supports Netlify Functions, AWS Lambda, and traditional server deployments.
Provides Docker images and Kubernetes manifests for self-hosted deployment with zero code changes required. Environment-variable-driven configuration enables same deployment artifact to run across dev/staging/prod with different backends.
Enables data sovereignty and compliance-driven deployment unavailable in SaaS-only solutions; requires operational overhead but provides full control over infrastructure and data
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with Rebuff, ranked by overlap. Discovered automatically through the match graph.
LLM Guard
Open-source LLM input/output security scanner toolkit.
Llama Guard 3
Meta's safety classifier for LLM content moderation.
Prompt Guard
Meta's prompt injection and jailbreak detection classifier.
Giskard
AI testing for quality, safety, compliance — vulnerability scanning, bias/toxicity detection.
llm-guard
A TypeScript library for validating and securing LLM prompts
Lakera
AI's ultimate shield: real-time threat detection, privacy,...
Best For
- ✓teams building latency-sensitive LLM applications
- ✓developers deploying on edge or resource-constrained environments
- ✓security teams needing explainable, rule-based detection
- ✓applications requiring high detection accuracy over latency
- ✓teams with budget for LLM API calls or local model hosting
- ✓security-critical applications where false negatives are costly
- ✓teams running production LLM applications with incident response processes
- ✓security teams wanting to build domain-specific attack intelligence
Known Limitations
- ⚠Cannot detect sophisticated attacks that don't match known patterns or keywords
- ⚠Requires manual maintenance of heuristic rules as new attack vectors emerge
- ⚠High false-positive rate on legitimate inputs containing injection-like keywords in context
- ⚠Adds 200-500ms latency per detection due to LLM inference
- ⚠Requires API credentials or local model deployment (increases operational complexity)
- ⚠LLM-based detection can be adversarially attacked with carefully crafted prompts
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
About
Open-source self-hardening prompt injection detector that uses multi-layered defense including heuristic analysis, LLM-based detection, vector similarity matching against known attacks, and canary token injection for leak detection.
Categories
Alternatives to Rebuff
Local knowledge graph for Claude Code. Builds a persistent map of your codebase so Claude reads only what matters — 6.8× fewer tokens on reviews and up to 49× on daily coding tasks.
Compare →The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.
Compare →Are you the builder of Rebuff?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →