declarative guardrail policy definition with yaml/json schemas
Enables developers to define safety policies, content filters, and validation rules using declarative YAML or JSON configuration files rather than imperative code. The framework parses these schemas at runtime and compiles them into executable guardrail chains that intercept and validate LLM inputs/outputs before they reach users or downstream systems. Supports conditional logic, regex patterns, semantic matching, and custom validator functions within a unified policy language.
Unique: Uses a declarative YAML/JSON schema approach for guardrail definition rather than imperative code, enabling non-developers to modify safety policies and providing version-controllable policy artifacts separate from application code
vs alternatives: More accessible than hand-coded validation logic and more flexible than hard-coded safety checks, allowing policy iteration without code deployment cycles
multi-stage input/output validation pipeline with semantic and syntactic checks
Implements a composable pipeline architecture that chains multiple validation stages (pre-processing, semantic analysis, syntactic checks, custom validators) to sanitize and validate both user inputs and LLM outputs. Each stage can apply different validation strategies: regex-based pattern matching, semantic similarity scoring against prohibited content vectors, PII detection, token-level analysis, and custom JavaScript functions. Stages execute sequentially with early exit on failure, and results include detailed violation metadata for logging and user feedback.
Unique: Combines syntactic (regex/pattern-based), semantic (embedding-based similarity), and custom validator stages in a single composable pipeline with early-exit optimization and detailed violation metadata, rather than applying single-layer validation
vs alternatives: More comprehensive than simple regex filtering and faster than full semantic re-ranking because it short-circuits on early validation failures rather than evaluating all stages
audit logging and compliance reporting with violation tracking
Automatically logs all guardrail violations with detailed metadata (timestamp, user ID, violation type, severity, enforcement action, conversation context) to enable compliance auditing and threat analysis. Supports structured logging to external systems (databases, logging services) and generates compliance reports summarizing violation patterns, enforcement actions, and policy effectiveness. Includes PII-safe logging that redacts sensitive information from logs while maintaining audit trail integrity.
Unique: Integrates comprehensive audit logging directly into the guardrail pipeline with PII-safe redaction and structured export for compliance reporting, rather than requiring manual logging implementation
vs alternatives: More complete than application-level logging because it captures guardrail-specific metadata and provides compliance-ready reporting, though requires external logging infrastructure for production deployments
typescript-first type-safe guardrail configuration and validation
Provides TypeScript interfaces and type definitions for guardrail configuration, enabling compile-time validation of policy definitions and IDE autocomplete for configuration options. Supports both YAML/JSON configuration files (with TypeScript schema validation) and programmatic configuration using TypeScript objects. Type safety extends to custom validator functions, ensuring they conform to expected signatures and receive properly typed context objects.
Unique: Provides full TypeScript type definitions for guardrail configuration and custom validators, enabling compile-time validation and IDE support rather than runtime-only validation
vs alternatives: Better developer experience than YAML-only configuration because of IDE autocomplete and compile-time error detection, though requires TypeScript knowledge and adds build-time overhead
framework-agnostic middleware integration for express, next.js, and other node.js servers
Provides middleware adapters for popular Node.js frameworks (Express, Next.js, Fastify, etc.) that integrate guardrails into request/response pipelines. Middleware intercepts requests before they reach route handlers, applies guardrails to user input, and intercepts responses to validate LLM output before sending to clients. Supports both synchronous and asynchronous middleware patterns and integrates with framework-specific error handling and logging.
Unique: Provides framework-specific middleware adapters that integrate guardrails into request/response pipelines with minimal application changes, rather than requiring manual integration at each endpoint
vs alternatives: Easier to integrate into existing applications than manual guardrail calls at each endpoint, though adds latency to all requests and may be too late for some attack vectors
prompt injection attack detection via structural analysis
Detects prompt injection attempts by analyzing input structure, token patterns, and semantic anomalies that indicate attempts to override system instructions or manipulate model behavior. Uses techniques including delimiter detection (looking for common injection markers like 'ignore previous instructions'), instruction-like pattern recognition, and comparison against baseline input distributions. Can be configured with custom injection patterns and severity thresholds, and provides detailed reports on detected injection vectors.
Unique: Uses structural and pattern-based analysis to detect injection attempts rather than relying solely on semantic similarity, enabling detection of novel injection vectors and providing detailed attack vector identification
vs alternatives: Faster and more interpretable than semantic-only detection because it identifies specific injection patterns and markers, though less robust against sophisticated paraphrased attacks than ensemble approaches
content moderation with semantic similarity scoring against prohibited topic vectors
Implements semantic content moderation by embedding user inputs and LLM outputs, then computing cosine similarity against pre-built vectors representing prohibited topics (violence, hate speech, sexual content, etc.). Uses OpenAI embeddings or custom embedding models to generate vector representations, compares against a configurable library of harmful content vectors, and returns similarity scores with configurable thresholds for blocking. Supports category-specific thresholds and allows whitelisting of legitimate uses of sensitive topics.
Unique: Uses embedding-based semantic similarity scoring against prohibited topic vectors rather than keyword lists or regex patterns, enabling detection of paraphrased harmful content and supporting category-specific thresholds
vs alternatives: More semantically aware than regex-based filtering and faster than full LLM re-evaluation, but slower and more expensive than keyword matching while being less robust than ensemble approaches combining multiple detection methods
structured output validation with schema enforcement
Validates LLM outputs against JSON schemas or TypeScript interfaces to ensure responses conform to expected structure, data types, and constraints. Parses LLM text output, attempts to extract JSON, validates against provided schema using JSON Schema validators, and returns structured validation results with detailed error messages indicating which fields failed validation. Supports nested schemas, array validation, enum constraints, and custom validation functions for business logic (e.g., 'price must be positive').
Unique: Integrates schema validation as a guardrail stage in the output pipeline, enabling automatic rejection of malformed LLM outputs and providing structured error feedback for retry logic
vs alternatives: More reliable than manual JSON parsing and provides better error messages than try-catch blocks, though doesn't guarantee semantic correctness and requires LLM cooperation in output format
+5 more capabilities