llm (Simon Willison)
CLI ToolFreeCLI for LLMs — multi-provider, conversation history, templates, embeddings, plugin ecosystem.
Capabilities13 decomposed
provider-agnostic model abstraction with unified interface
Medium confidenceImplements a dual sync/async base class hierarchy (Model, AsyncModel, KeyModel, AsyncKeyModel) defined in llm/models.py that abstracts away provider-specific details. Any model—whether OpenAI, Anthropic, local, or plugin-provided—inherits from these base classes and implements prompt() and execute() methods, allowing identical code to work across all providers without conditional logic or provider detection.
Uses inheritance-based polymorphism with separate sync/async class hierarchies (Model vs AsyncModel) rather than wrapper patterns, enabling native async/await support without callback hell. Plugin system auto-discovers and registers models via entry points, eliminating manual provider registration.
More flexible than LangChain's LLMBase because it supports both sync and async natively without wrapping, and simpler than Anthropic SDK because it doesn't require provider-specific imports for basic operations.
persistent conversation history with sqlite logging
Medium confidenceAutomatically logs all model interactions to a local SQLite database (logs.db) with full conversation state, including prompts, responses, model metadata, tokens used, and timestamps. The Conversation class in llm/models.py maintains multi-turn dialogue state and can be serialized/deserialized from the database, enabling conversation resumption, audit trails, and historical analysis without external services.
Uses SQLite as the primary persistence layer rather than in-memory caches or external services, making conversation history available offline and queryable via SQL. Conversation class encapsulates both state and serialization, allowing seamless round-tripping between Python objects and database records.
Simpler and more portable than LangChain's memory implementations because it doesn't require Redis or external databases, and more transparent than Anthropic's conversation API because you own and can query the raw data.
python api for programmatic llm access
Medium confidenceExposes a Python library interface (llm module) that allows developers to interact with models programmatically without using the CLI. Core functions like llm.get_model(), model.prompt(), and model.execute() provide a simple API for single-turn and multi-turn interactions. The API supports both sync and async patterns, enabling integration into web frameworks, scripts, and applications. Responses are returned as Response objects with methods for accessing text, JSON, and usage statistics.
Provides both sync and async APIs at the same level of abstraction, allowing developers to choose based on their use case without learning two different libraries. Response objects provide multiple accessors (text(), json(), usage()) that abstract away provider-specific response formats.
Simpler than OpenAI's SDK because it abstracts away provider-specific details, and more flexible than Anthropic's SDK because it supports multiple providers and async natively.
batch embedding and cost estimation
Medium confidenceSupports generating embeddings for large batches of text via the embed_batch() method on EmbeddingModel, which is more efficient than calling embed() repeatedly. The system tracks token usage and can estimate costs based on model pricing. Batch operations are optimized to minimize API calls and reduce costs, particularly useful for processing large document corpora.
Batch operations are optimized at the EmbeddingModel level, allowing providers to implement efficient batch APIs (e.g., OpenAI's batch endpoint) without changing the caller's code. Cost estimation is built-in, enabling developers to make informed decisions about batch size and model choice.
More efficient than calling embed() in a loop because it batches API calls, and more transparent than cloud provider dashboards because cost estimates are available programmatically.
model capability introspection and feature detection
Medium confidenceProvides methods to query model capabilities at runtime, such as whether a model supports function calling, vision, streaming, or structured output. The Model base class exposes properties and methods that describe supported features, enabling applications to adapt behavior based on model capabilities without hardcoding provider-specific logic. This enables graceful degradation when features are unavailable.
Capability information is exposed via properties and methods on the Model class, allowing runtime feature detection without external configuration. This enables applications to adapt to model capabilities without hardcoding provider-specific logic.
More flexible than hardcoding capabilities because they can be queried at runtime, and more reliable than trying features and catching exceptions because capabilities are known upfront.
plugin-based model and tool discovery with entry points
Medium confidenceImplements a plugin system using Python entry points (setuptools) that auto-discovers and registers custom models, tools, and templates at runtime. The plugin manager in llm/cli.py scans installed packages for llm.models, llm.tools, and llm.templates entry points, dynamically loading them without modifying core code. Plugins can extend functionality by subclassing Model, Tool, or Template base classes.
Uses Python's standard entry_points mechanism rather than custom plugin loaders, making plugins installable via pip and discoverable by any tool that reads entry points. Plugin base classes (Model, Tool, Template) are simple and require minimal boilerplate, lowering the barrier to contribution.
More lightweight than LangChain's integration system because it relies on standard Python packaging rather than custom registries, and more discoverable than Anthropic's approach because plugins are installed as regular packages and visible to the Python environment.
tool execution and function calling with schema validation
Medium confidenceEnables models to invoke Python functions by defining a Tool class with a function() decorator and optional JSON schema. The Toolbox class collects related tools and prepares them for model consumption via prepare() method, which generates tool schemas compatible with OpenAI and Anthropic function-calling APIs. When a model invokes a tool, llm executes the corresponding Python function and returns the result to the model, enabling multi-step reasoning and external action.
Decouples tool definition (Python functions) from tool schema (JSON) via a decorator pattern, allowing the same function to be used with multiple model providers without rewriting. Toolbox.prepare() generates provider-specific schemas on-the-fly, abstracting away OpenAI vs Anthropic schema differences.
Simpler than LangChain's tool system because it doesn't require wrapping functions in Tool objects with separate schema definitions, and more portable than Anthropic's native tool_use because it works across providers via the plugin abstraction.
structured output generation with json schema enforcement
Medium confidenceSupports constrained generation where models must return JSON matching a provided schema. The Prompt class accepts a schema parameter, and the Response class provides a json() method that parses and validates the model output against the schema. Some providers (e.g., OpenAI with JSON mode) enforce this at the API level; others validate client-side. This enables reliable extraction of structured data (e.g., entities, classifications) from unstructured model outputs.
Decouples schema definition from model invocation via the Prompt class, allowing the same schema to be used across different models and providers. Response.json() method provides a unified interface for parsing and validating output, abstracting away provider-specific JSON mode implementations.
More flexible than Anthropic's native structured output because it works across providers via plugins, and simpler than LangChain's output parsers because it doesn't require custom parser classes for each schema.
multi-modal input handling with attachments and fragments
Medium confidenceSupports attaching images, audio, files, and other media to prompts via the Prompt class's attachments parameter. The Fragments system encapsulates different media types and their metadata, allowing models to process multi-modal inputs. Attachments are serialized and logged to the SQLite database, enabling conversation history to preserve media references. Different models support different attachment types (e.g., OpenAI supports images, Anthropic supports images and PDFs).
Uses a Fragments abstraction to represent different media types uniformly, allowing the same Prompt class to handle text, images, audio, and files without conditional logic. Attachments are persisted to the conversation log, making multi-modal conversation history queryable and reproducible.
More unified than OpenAI's API because it abstracts away provider-specific attachment formats, and more persistent than Anthropic's approach because attachments are logged to the database for future reference.
embedding generation and semantic search with vector storage
Medium confidenceProvides an EmbeddingModel base class for generating vector embeddings from text. The embedding system stores vectors in a separate SQLite database (embeddings.db) with associated metadata, enabling semantic search and similarity operations. Plugins can provide embedding models (e.g., OpenAI's text-embedding-3-small, local models via Ollama). The embed() and embed_batch() methods support both single and bulk embedding generation.
Separates embedding storage from conversation logs (embeddings.db vs logs.db), allowing independent scaling and querying of embeddings. EmbeddingModel abstraction enables swapping embedding providers without changing application code, and batch operations optimize cost for bulk embedding generation.
More integrated than using OpenAI's API directly because it provides a unified interface across embedding models and handles storage, and simpler than LangChain's embedding system because it doesn't require external vector databases for basic use cases.
interactive cli chat with streaming responses
Medium confidenceProvides an interactive chat interface via the llm chat command that maintains conversation state, handles multi-turn interactions, and streams model responses to the terminal in real-time. The CLI uses the Conversation class to manage history and the AsyncModel interface for non-blocking streaming. Responses are displayed incrementally as tokens arrive, improving perceived latency and enabling early interruption.
Uses async/await with streaming iterators to display responses incrementally without blocking the terminal, and integrates conversation persistence directly into the CLI so history is automatically saved without explicit commands.
More responsive than ChatGPT's web interface for power users because responses stream immediately, and more portable than Anthropic's console because it's a local CLI with no external dependencies.
prompt templating with variable substitution and reusability
Medium confidenceSupports defining reusable prompt templates with variable placeholders that can be instantiated with different values. Templates are stored as files or registered via the plugin system, and can include system prompts, tools, and schemas. The template system uses simple string substitution (e.g., {variable_name}) to inject values at runtime, enabling prompt reuse across different contexts without code duplication.
Templates are first-class citizens in the plugin system, allowing teams to distribute and share prompt templates as packages. Templates can include not just text but also system prompts, tools, and schemas, making them more powerful than simple string templates.
Simpler than LangChain's prompt templates because it doesn't require a full templating engine, and more discoverable than storing prompts in code because templates are stored as files and registered via entry points.
model aliasing and configuration management
Medium confidenceAllows users to define aliases for models (e.g., 'default' -> 'gpt-4-turbo') and configure model parameters (temperature, max_tokens, system prompts) via configuration files in ~/.llm/. The configuration system supports per-model settings and global defaults, enabling users to customize model behavior without modifying code. Aliases are stored in a configuration file and loaded at CLI startup.
Configuration is stored in user-friendly files (not code) and loaded at startup, allowing non-technical users to customize model behavior. Aliases enable switching between models without changing prompts or code, supporting A/B testing and gradual migration between providers.
More user-friendly than environment variables because configuration is discoverable and editable in files, and more flexible than hardcoded defaults because aliases can be changed without redeploying code.
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with llm (Simon Willison), ranked by overlap. Discovered automatically through the match graph.
LLM
A CLI utility and Python library for interacting with Large Language Models, remote and local. [#opensource](https://github.com/simonw/llm)
llm
CLI tool for interacting with LLMs.
gptme
Personal AI assistant in terminal — code execution, file manipulation, web browsing, self-correcting.
chatbox
Powerful AI Client
Lobe Chat
Modern ChatGPT UI framework — 100+ providers, multimodal, plugins, RAG, Vercel deploy.
autogen
A programming framework for agentic AI
Best For
- ✓developers building multi-provider LLM applications
- ✓teams wanting to avoid vendor lock-in to a single model provider
- ✓plugin developers extending llm with custom models
- ✓applications requiring audit trails and compliance logging
- ✓interactive CLI tools and chatbots with session management
- ✓teams analyzing LLM usage patterns and optimizing costs
- ✓Python developers building LLM-powered applications
- ✓teams integrating LLMs into existing Python codebases
Known Limitations
- ⚠Abstractions add minimal overhead but require all models to implement the full interface even if some methods are no-ops
- ⚠Provider-specific features (e.g., vision capabilities, function calling schemas) must be normalized to a common interface, potentially losing nuanced control
- ⚠Async/sync duality requires maintaining two code paths, increasing maintenance burden
- ⚠SQLite is single-writer, so high-concurrency scenarios (many simultaneous conversations) may experience lock contention
- ⚠Database grows unbounded without manual pruning; no built-in retention policies or archival
- ⚠Logging happens synchronously by default, adding latency to each LLM call (typically <10ms for local SQLite writes)
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
About
CLI tool and Python library for interacting with LLMs. Supports OpenAI, Anthropic, local models via plugins. Features conversation history, templates, embeddings, and a plugin ecosystem. By the creator of Datasette.
Categories
Alternatives to llm (Simon Willison)
Are you the builder of llm (Simon Willison)?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →