Guidance
FrameworkFreeMicrosoft's language for efficient LLM control flow.
Capabilities15 decomposed
grammar-constrained text generation with token healing
Medium confidenceGenerates text from LLMs while enforcing constraints defined as an AST of GrammarNode subclasses (LiteralNode, RegexNode, SelectNode, JsonNode). Uses a token healing mechanism that operates at the text level rather than token level to correctly handle text boundaries, preventing invalid token sequences at constraint edges. The TokenParser and ByteParser engines integrate constraints directly into the generation loop, ensuring every token respects the grammar before being produced.
Implements token healing at the text level (not token level) with an immutable GrammarNode AST architecture, allowing constraints to be composed and reused across programs while maintaining correct behavior at token boundaries. The TokenParser/ByteParser dual-engine design handles both token-level and byte-level constraints without requiring external validation passes.
More efficient than post-generation validation (no retry loops) and more flexible than simple prompt engineering because constraints are enforced during generation, not after, reducing wasted tokens and guaranteeing format compliance on first attempt.
stateful execution with interleaved control flow and generation
Medium confidenceMaintains model state through immutable lm objects that accumulate generated text, captured variables, and execution context across multiple generation steps. The @guidance decorator transforms Python functions into programs that interleave traditional control flow (conditionals, loops, function calls) with constrained text generation, executing them in a unified stateful context. Each step in the program updates the lm state object, which carries forward to subsequent steps, enabling dynamic decision-making based on previous generations.
Uses immutable lm state objects that accumulate text and captures across decorated function boundaries, enabling Python control flow (if/else, for loops, function calls) to be seamlessly interleaved with generation. The @guidance decorator acts as a compiler that transforms Python functions into stateful generation programs without requiring explicit state threading.
More expressive than simple prompt templates because it allows arbitrary Python logic to drive generation decisions, and more maintainable than hand-rolled state management because the decorator handles state threading automatically across function boundaries.
ebnf grammar definition and composition
Medium confidenceAllows developers to define reusable grammar rules using Extended Backus-Naur Form (EBNF) syntax, which are compiled into GrammarNode ASTs. Rules can reference other rules, enabling composition of complex grammars from simpler components. The EBNF parser (guidance/library/_ebnf.py) converts textual grammar definitions into executable constraints. Rules are stored in a grammar registry and can be reused across multiple Guidance programs.
Provides EBNF syntax for defining grammars that are compiled into GrammarNode ASTs, enabling developers to express complex constraints using a standard formal notation. Rules are composable and reusable across programs via a grammar registry.
More expressive and maintainable than nested Python grammar objects because EBNF is a standard notation, and more flexible than hardcoded format strings because rules can be parameterized and composed.
token-level and byte-level parsing with dual-engine architecture
Medium confidenceImplements two parsing engines (TokenParser and ByteParser) that operate at different levels of abstraction. TokenParser works at the token level, validating that generated tokens conform to grammar constraints. ByteParser operates at the byte level, handling sub-token constraints and ensuring correct behavior at character boundaries. The dual-engine design allows constraints to be expressed at the appropriate level of abstraction while maintaining correctness across token boundaries.
Implements a dual-engine architecture (TokenParser and ByteParser) that operates at both token and byte levels, enabling constraints to be enforced at the appropriate abstraction level while maintaining correctness at boundaries. Token healing is implemented through careful coordination between engines.
More efficient than purely byte-level parsing because token-level constraints are faster, and more correct than purely token-level parsing because byte-level constraints handle edge cases at token boundaries.
llama.cpp and transformers local model inference
Medium confidenceProvides native integration with local LLM inference engines (llama.cpp via llama-cpp-python, and Hugging Face Transformers). Enables running Guidance programs against locally-hosted models without cloud API dependencies. Supports model quantization, GPU acceleration, and batch processing. The local model backend handles tokenization, context management, and generation scheduling directly within the Python process.
Provides native integration with llama.cpp (via llama-cpp-python) and Transformers, enabling local inference with full Guidance constraint support. Handles tokenization, context management, and generation scheduling within the Python process without external service dependencies.
More cost-effective than cloud APIs for high-volume inference and more privacy-preserving because data never leaves the local machine, though with higher infrastructure requirements.
openai, azure openai, and vertexai remote api integration
Medium confidenceProvides unified integration with remote LLM APIs (OpenAI, Azure OpenAI, Google VertexAI) through a common backend interface. Handles API authentication, request formatting, token counting, and response parsing. Supports streaming and non-streaming modes. The remote backend abstracts differences between API protocols while maintaining Guidance's constraint semantics.
Provides unified backend abstraction for OpenAI, Azure OpenAI, and VertexAI APIs, normalizing differences in authentication, request formatting, and response parsing. Maintains Guidance's constraint semantics across different API protocols.
More convenient than direct API client usage because Guidance handles constraint enforcement and state management, and more flexible than provider-specific SDKs because the same code works across multiple providers.
capture and variable extraction from constrained generation
Medium confidenceAutomatically extracts and stores named captures from constrained generation into the lm state object. Supports capturing from regex groups, selected options, JSON fields, and literal text. Captured variables are accessible in subsequent generation steps and control flow branches. The capture mechanism enables dynamic decision-making based on what the model generated in previous steps.
Automatically extracts named captures from constrained generation (regex groups, JSON fields, selected options) and stores them in the lm state for use in subsequent steps. Enables dynamic workflows where each step uses outputs from previous steps.
More integrated than post-generation parsing because captures are extracted during generation, and more flexible than hardcoded extraction logic because capture names can be defined in constraints.
multi-backend model abstraction with unified api
Medium confidenceProvides a unified interface for executing Guidance programs across heterogeneous LLM backends (local: LlamaCpp, Transformers; remote: OpenAI, Azure OpenAI, VertexAI) without changing program code. The model abstraction layer (guidance/models/_base) defines a common interface that each backend implements, handling differences in tokenization, API protocols, and inference engines. Programs written against the abstract model interface automatically work with any backend by swapping the model initialization parameter.
Implements a backend abstraction layer (guidance/models/_base/_model.py) that normalizes differences between local inference engines (LlamaCpp, Transformers) and remote APIs (OpenAI, Azure, VertexAI) through a common interface, enabling the same Guidance program to execute unchanged across any backend. Uses dependency injection to swap backends at initialization time.
More flexible than LangChain's model abstraction because it preserves Guidance's constraint semantics across backends, and more comprehensive than raw API clients because it handles tokenization normalization and state management automatically.
json schema-constrained generation with automatic validation
Medium confidenceGenerates valid JSON output that conforms to a provided schema using the JsonNode grammar constraint. The schema is converted into a grammar that guides token generation to produce only valid JSON matching the schema structure, types, and constraints. This eliminates the need for post-generation parsing, validation, or retry loops—the output is guaranteed to be valid JSON on the first attempt. Supports nested objects, arrays, enums, and type constraints (string, number, boolean, null).
Converts JSON schemas into grammar constraints (JsonNode) that guide generation token-by-token, guaranteeing valid JSON output without post-processing. Unlike post-hoc validation approaches, the schema is enforced during generation, preventing invalid tokens from being produced in the first place.
More efficient than JSON repair libraries (no retry loops or parsing errors) and more reliable than prompt-based JSON generation because the schema is enforced at the token level, not just in the prompt.
tool calling and function invocation with schema-based routing
Medium confidenceEnables the model to call external functions or tools by defining a schema of available tools and their parameters, then using constrained generation to produce valid tool-calling syntax. The model generates structured tool calls (function name + arguments) that conform to the schema, which are then executed by the framework and results are fed back into the generation context. Supports multiple tool definitions, parameter validation, and result integration into subsequent generation steps.
Uses grammar constraints to enforce valid tool-calling syntax, ensuring the model produces well-formed function calls that match the schema before execution. Tool results are automatically integrated back into the lm state, enabling multi-step agentic loops without manual state threading.
More reliable than prompt-based tool calling because the schema is enforced during generation (preventing malformed calls), and more integrated than external tool-calling libraries because tool results flow directly into subsequent generation steps via the lm state.
chat role and template management with structured conversations
Medium confidenceProvides abstractions for managing multi-turn conversations with distinct roles (user, assistant, system) and chat templates that format messages according to model-specific conventions. The framework handles role switching, message formatting, and context accumulation across turns without requiring manual string concatenation. Chat templates are model-aware and automatically adapt to different model families (e.g., Llama's ChatML format vs. OpenAI's message format).
Abstracts chat template formatting through model-aware template definitions, automatically adapting message formatting to different model families (ChatML, Alpaca, OpenAI format) without requiring code changes. Role switching and context accumulation are handled transparently by the framework.
More maintainable than manual role tag concatenation because templates are centralized and model-aware, and more flexible than hardcoded format strings because templates can be swapped at initialization time.
regex-based generation with pattern matching
Medium confidenceConstrains text generation to match regular expressions using the RegexNode grammar constraint. The model generates text token-by-token while respecting the regex pattern, ensuring output matches the specified pattern without post-generation validation. Supports complex regex patterns including character classes, quantifiers, alternation, and lookahead/lookbehind assertions. Captured groups from the regex can be extracted and stored in the lm state for later use.
Converts regex patterns into grammar constraints (RegexNode) that guide token-by-token generation, ensuring output matches the pattern without post-processing. Uses the regex engine to validate token sequences in real-time during generation.
More efficient than regex validation after generation because invalid tokens are prevented from being produced, and more flexible than hardcoded format strings because arbitrary regex patterns can be used.
selection and branching with constrained choice generation
Medium confidenceConstrains generation to choose from a predefined set of options using the SelectNode grammar constraint. The model generates text that matches exactly one of the provided options, preventing hallucination of alternatives. Supports both string literals and nested grammar rules as options. The selected option is captured in the lm state for conditional branching in subsequent steps.
Implements SelectNode as a grammar constraint that forces the model to choose from exactly one option, preventing hallucination of alternatives. The selected option is automatically captured in the lm state for use in conditional branching.
More reliable than prompt-based selection because the constraint is enforced during generation, and more efficient than post-generation filtering because invalid choices are never produced.
notebook integration and interactive visualization
Medium confidenceProvides Jupyter widget integration for visualizing Guidance program execution, token generation, and constraint satisfaction in real-time. Widgets display the current lm state, generated text, captured variables, and grammar constraints being applied. Enables interactive debugging and exploration of how constraints affect generation at each step. Supports both inline visualization and detailed inspection of execution traces.
Integrates Jupyter widgets to provide real-time visualization of constraint application, token generation, and lm state evolution during program execution. Enables interactive exploration of how grammar constraints affect generation decisions.
More informative than text-based logging because it visualizes constraint satisfaction and state changes graphically, and more interactive than static traces because users can inspect state at any point during execution.
caching and stateless execution modes
Medium confidenceSupports both stateful (default) and stateless execution modes, with optional caching of generation results. In stateless mode, each Guidance program invocation is independent with no accumulated state between calls. Caching stores results of previous generations to avoid recomputation when the same prompt and constraints are used again. The cache key is derived from the prompt, constraints, and model parameters, enabling efficient reuse across multiple invocations.
Provides both stateful (default) and stateless execution modes with optional result caching, allowing developers to choose between accumulated context (for multi-turn reasoning) and independent invocations (for distributed/serverless deployments). Cache keys are automatically derived from prompts, constraints, and model parameters.
More flexible than frameworks that enforce a single execution model because both stateful and stateless modes are supported, and more efficient than naive caching because cache keys account for constraint and model parameter variations.
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with Guidance, ranked by overlap. Discovered automatically through the match graph.
guidance
A guidance language for controlling large language models.
Outlines
Structured text generation — guarantees LLM outputs match JSON schemas or grammars.
llama-cpp-python
Python bindings for the llama.cpp library
llama.cpp
Inference of Meta's LLaMA model (and others) in pure C/C++. #opensource
llama.cpp
C/C++ LLM inference — GGUF quantization, GPU offloading, foundation for local AI tools.
outlines
Structured Outputs
Best For
- ✓developers building structured output pipelines (JSON APIs, form filling)
- ✓teams implementing deterministic LLM workflows requiring format guarantees
- ✓builders of domain-specific language models with strict syntax requirements
- ✓developers building agentic workflows with dynamic decision trees
- ✓teams implementing chain-of-thought reasoning with intermediate validation
- ✓builders of complex prompting systems that require conditional branching based on model outputs
- ✓developers building domain-specific languages or format validators
- ✓teams maintaining shared grammar libraries across projects
Known Limitations
- ⚠Grammar constraints add computational overhead during generation; complex grammars may reduce throughput by 20-40%
- ⚠Token healing requires text-level processing which can introduce latency at constraint boundaries
- ⚠Deeply nested or recursive grammar definitions may cause memory overhead in the AST representation
- ⚠Some edge cases with multi-byte UTF-8 characters at constraint boundaries require careful grammar design
- ⚠Stateful execution requires maintaining lm objects in memory; large accumulated contexts can consume significant RAM
- ⚠The @guidance decorator adds Python function call overhead (~5-10ms per decorated function invocation)
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
About
Microsoft's efficient language for controlling LLMs that interleaves generation, prompting, and logical control into a single continuous flow, enabling constrained generation, JSON output, and tool use with token efficiency.
Categories
Alternatives to Guidance
Are you the builder of Guidance?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →