What can llm (Simon Willison) do?

provider-agnostic model abstraction with unified interface, persistent conversation history with sqlite logging, python api for programmatic llm access, batch embedding and cost estimation, model capability introspection and feature detection, plugin-based model and tool discovery with entry points, tool execution and function calling with schema validation, structured output generation with json schema enforcement, multi-modal input handling with attachments and fragments, embedding generation and semantic search with vector storage, interactive cli chat with streaming responses, prompt templating with variable substitution and reusability, model aliasing and configuration management

llm (Simon Willison)

Q: What is llm (Simon Willison)?

CLI tool and Python library for interacting with LLMs. Supports OpenAI, Anthropic, local models via plugins. Features conversation history, templates, embeddings, and a plugin ecosystem. By the creator of Datasette.

CLI ToolFree

CLI for LLMs — multi-provider, conversation history, templates, embeddings, plugin ecosystem.

Open Source

/ 100

13 capabilities

Capabilities13 decomposed

provider-agnostic model abstraction with unified interface

Medium confidence

Implements a dual sync/async base class hierarchy (Model, AsyncModel, KeyModel, AsyncKeyModel) defined in llm/models.py that abstracts away provider-specific details. Any model—whether OpenAI, Anthropic, local, or plugin-provided—inherits from these base classes and implements prompt() and execute() methods, allowing identical code to work across all providers without conditional logic or provider detection.

Solves for

I want to write code that works with OpenAI today but can switch to Anthropic or a local model tomorrow without refactoringI need to support multiple LLM providers in my application without duplicating prompt/response handling logicI want to add a new model provider via plugin without modifying core application code

Best for

developers building multi-provider LLM applications

teams wanting to avoid vendor lock-in to a single model provider

plugin developers extending llm with custom models

Requires

Python 3.9+

llm package installed

API keys or local model setup for at least one provider

Limitations

Abstractions add minimal overhead but require all models to implement the full interface even if some methods are no-ops

Provider-specific features (e.g., vision capabilities, function calling schemas) must be normalized to a common interface, potentially losing nuanced control

Async/sync duality requires maintaining two code paths, increasing maintenance burden

What makes it unique

Uses inheritance-based polymorphism with separate sync/async class hierarchies (Model vs AsyncModel) rather than wrapper patterns, enabling native async/await support without callback hell. Plugin system auto-discovers and registers models via entry points, eliminating manual provider registration.

vs alternatives

More flexible than LangChain's LLMBase because it supports both sync and async natively without wrapping, and simpler than Anthropic SDK because it doesn't require provider-specific imports for basic operations.

persistent conversation history with sqlite logging

Medium confidence

Automatically logs all model interactions to a local SQLite database (logs.db) with full conversation state, including prompts, responses, model metadata, tokens used, and timestamps. The Conversation class in llm/models.py maintains multi-turn dialogue state and can be serialized/deserialized from the database, enabling conversation resumption, audit trails, and historical analysis without external services.

Solves for

I want every LLM interaction my application makes to be logged for compliance and debuggingI need to resume a multi-turn conversation with an LLM after my application restartsI want to analyze conversation patterns, token usage, and cost across all my LLM calls

Best for

applications requiring audit trails and compliance logging

interactive CLI tools and chatbots with session management

teams analyzing LLM usage patterns and optimizing costs

Requires

Python 3.9+

SQLite3 (included in Python stdlib)

Write permissions to ~/.llm/ directory (default logs location)

Limitations

SQLite is single-writer, so high-concurrency scenarios (many simultaneous conversations) may experience lock contention

Database grows unbounded without manual pruning; no built-in retention policies or archival

Logging happens synchronously by default, adding latency to each LLM call (typically <10ms for local SQLite writes)

What makes it unique

Uses SQLite as the primary persistence layer rather than in-memory caches or external services, making conversation history available offline and queryable via SQL. Conversation class encapsulates both state and serialization, allowing seamless round-tripping between Python objects and database records.

vs alternatives

Simpler and more portable than LangChain's memory implementations because it doesn't require Redis or external databases, and more transparent than Anthropic's conversation API because you own and can query the raw data.

python api for programmatic llm access

Medium confidence

Exposes a Python library interface (llm module) that allows developers to interact with models programmatically without using the CLI. Core functions like llm.get_model(), model.prompt(), and model.execute() provide a simple API for single-turn and multi-turn interactions. The API supports both sync and async patterns, enabling integration into web frameworks, scripts, and applications. Responses are returned as Response objects with methods for accessing text, JSON, and usage statistics.

Solves for

I want to use LLMs in my Python application without shelling out to the CLII need to integrate LLM calls into a web framework (Flask, FastAPI, Django) with async supportI want to build a Python script that processes multiple prompts in parallel using async/await

Best for

Python developers building LLM-powered applications

teams integrating LLMs into existing Python codebases

developers needing async support for high-concurrency scenarios

Requires

Python 3.9+

llm package installed

API keys for model providers

Limitations

Python-only; no official SDKs for other languages

Async API requires understanding of Python's asyncio; synchronous API is simpler but blocks on I/O

Error handling is minimal; API errors are raised as exceptions without built-in retry logic

What makes it unique

Provides both sync and async APIs at the same level of abstraction, allowing developers to choose based on their use case without learning two different libraries. Response objects provide multiple accessors (text(), json(), usage()) that abstract away provider-specific response formats.

vs alternatives

Simpler than OpenAI's SDK because it abstracts away provider-specific details, and more flexible than Anthropic's SDK because it supports multiple providers and async natively.

batch embedding and cost estimation

Medium confidence

Supports generating embeddings for large batches of text via the embed_batch() method on EmbeddingModel, which is more efficient than calling embed() repeatedly. The system tracks token usage and can estimate costs based on model pricing. Batch operations are optimized to minimize API calls and reduce costs, particularly useful for processing large document corpora.

Solves for

I want to embed a large corpus of documents efficiently without making thousands of individual API callsI need to estimate the cost of embedding a dataset before committing to itI want to parallelize embedding generation to speed up processing

Best for

teams building RAG systems with large document collections

applications requiring bulk embedding generation

cost-conscious teams optimizing LLM spending

Requires

Python 3.9+

llm package with embedding support

embedding model with batch support

Limitations

Batch operations are limited by API rate limits; very large batches may need to be split

Cost estimation is approximate and depends on accurate pricing data; actual costs may vary

No built-in parallelization; caller must use threading/multiprocessing for concurrent batches

What makes it unique

Batch operations are optimized at the EmbeddingModel level, allowing providers to implement efficient batch APIs (e.g., OpenAI's batch endpoint) without changing the caller's code. Cost estimation is built-in, enabling developers to make informed decisions about batch size and model choice.

vs alternatives

More efficient than calling embed() in a loop because it batches API calls, and more transparent than cloud provider dashboards because cost estimates are available programmatically.

model capability introspection and feature detection

Medium confidence

Provides methods to query model capabilities at runtime, such as whether a model supports function calling, vision, streaming, or structured output. The Model base class exposes properties and methods that describe supported features, enabling applications to adapt behavior based on model capabilities without hardcoding provider-specific logic. This enables graceful degradation when features are unavailable.

Solves for

I want to check if a model supports vision before trying to send an imageI need to fall back to a different model if the selected model doesn't support function callingI want to adapt my application's behavior based on what the model can do

Best for

applications supporting multiple models with varying capabilities

developers building adaptive systems that degrade gracefully

teams implementing feature flags based on model capabilities

Requires

Python 3.9+

llm package with capability introspection support

model instance

Limitations

Capability information must be manually maintained for each model; no automatic detection

Capabilities may change as models are updated; cached capability info can become stale

No standardized capability schema; different providers use different terminology

What makes it unique

Capability information is exposed via properties and methods on the Model class, allowing runtime feature detection without external configuration. This enables applications to adapt to model capabilities without hardcoding provider-specific logic.

vs alternatives

More flexible than hardcoding capabilities because they can be queried at runtime, and more reliable than trying features and catching exceptions because capabilities are known upfront.

plugin-based model and tool discovery with entry points

Medium confidence

Implements a plugin system using Python entry points (setuptools) that auto-discovers and registers custom models, tools, and templates at runtime. The plugin manager in llm/cli.py scans installed packages for llm.models, llm.tools, and llm.templates entry points, dynamically loading them without modifying core code. Plugins can extend functionality by subclassing Model, Tool, or Template base classes.

Solves for

I want to package a custom LLM provider (e.g., a proprietary API) as a plugin that users can install with pipI need to add domain-specific tools (e.g., database queries, API calls) that my LLM can invoke without modifying the core llm codebaseI want to distribute reusable prompt templates as a plugin that teams can share

Best for

plugin developers extending llm with custom models or tools

teams building internal LLM tools and wanting to distribute them as packages

open-source contributors adding new provider support

Requires

Python 3.9+

setuptools with entry_points support

llm package installed in the same Python environment

Limitations

Entry point discovery happens at CLI startup, so new plugins require reinstalling/restarting the CLI

No built-in versioning or compatibility checking; breaking changes in plugin APIs can silently fail

Plugin isolation is minimal—malicious or buggy plugins can crash the entire llm process

What makes it unique

Uses Python's standard entry_points mechanism rather than custom plugin loaders, making plugins installable via pip and discoverable by any tool that reads entry points. Plugin base classes (Model, Tool, Template) are simple and require minimal boilerplate, lowering the barrier to contribution.

vs alternatives

More lightweight than LangChain's integration system because it relies on standard Python packaging rather than custom registries, and more discoverable than Anthropic's approach because plugins are installed as regular packages and visible to the Python environment.

tool execution and function calling with schema validation

Medium confidence

Enables models to invoke Python functions by defining a Tool class with a function() decorator and optional JSON schema. The Toolbox class collects related tools and prepares them for model consumption via prepare() method, which generates tool schemas compatible with OpenAI and Anthropic function-calling APIs. When a model invokes a tool, llm executes the corresponding Python function and returns the result to the model, enabling multi-step reasoning and external action.

Solves for

I want my LLM to call Python functions (e.g., database queries, API calls) as part of its reasoning processI need to constrain what functions an LLM can call and validate their inputs using JSON schemasI want to build an agentic system where the LLM decides which tools to use and in what order

Best for

developers building LLM agents with external tool access

teams implementing retrieval-augmented generation (RAG) with tool-based document fetching

applications requiring LLM-driven automation with guardrails (schema validation)

Requires

Python 3.9+

llm package with Tool and Toolbox classes

model that supports function calling (OpenAI, Anthropic, or plugin-provided)

Limitations

Tool execution is synchronous by default; long-running tools block the LLM response loop

Schema validation is optional—if no schema is provided, the model can pass arbitrary arguments

No built-in timeout or resource limits on tool execution; runaway functions can hang the process

What makes it unique

Decouples tool definition (Python functions) from tool schema (JSON) via a decorator pattern, allowing the same function to be used with multiple model providers without rewriting. Toolbox.prepare() generates provider-specific schemas on-the-fly, abstracting away OpenAI vs Anthropic schema differences.

vs alternatives

Simpler than LangChain's tool system because it doesn't require wrapping functions in Tool objects with separate schema definitions, and more portable than Anthropic's native tool_use because it works across providers via the plugin abstraction.

structured output generation with json schema enforcement

Medium confidence

Supports constrained generation where models must return JSON matching a provided schema. The Prompt class accepts a schema parameter, and the Response class provides a json() method that parses and validates the model output against the schema. Some providers (e.g., OpenAI with JSON mode) enforce this at the API level; others validate client-side. This enables reliable extraction of structured data (e.g., entities, classifications) from unstructured model outputs.

Solves for

I want the LLM to always return valid JSON matching my data model, not arbitrary textI need to extract structured information (e.g., person name, email, phone) from unstructured text reliablyI want to enforce that the model returns one of a fixed set of options (e.g., sentiment: positive/negative/neutral)

Best for

applications requiring reliable data extraction from LLM outputs

teams building LLM-powered APIs that need to return structured responses

systems where downstream processing depends on consistent JSON structure

Requires

Python 3.9+

llm package with schema support

model that supports JSON mode or schema validation (OpenAI, Anthropic, or plugin-provided)

Limitations

Not all models support schema enforcement; some only validate client-side after generation, wasting tokens

Complex schemas may confuse models or cause generation failures; simpler schemas are more reliable

Schema enforcement adds latency (validation overhead) and may reduce output quality if the schema is too restrictive

What makes it unique

Decouples schema definition from model invocation via the Prompt class, allowing the same schema to be used across different models and providers. Response.json() method provides a unified interface for parsing and validating output, abstracting away provider-specific JSON mode implementations.

vs alternatives

More flexible than Anthropic's native structured output because it works across providers via plugins, and simpler than LangChain's output parsers because it doesn't require custom parser classes for each schema.

multi-modal input handling with attachments and fragments

Medium confidence

Supports attaching images, audio, files, and other media to prompts via the Prompt class's attachments parameter. The Fragments system encapsulates different media types and their metadata, allowing models to process multi-modal inputs. Attachments are serialized and logged to the SQLite database, enabling conversation history to preserve media references. Different models support different attachment types (e.g., OpenAI supports images, Anthropic supports images and PDFs).

Solves for

I want to send an image to the LLM and ask it questions about the image contentI need to process PDFs or documents with an LLM that supports document understandingI want to include screenshots or diagrams in a conversation with the LLM for analysis

Best for

applications requiring vision capabilities (image analysis, OCR, diagram understanding)

document processing pipelines that feed PDFs or images to LLMs

interactive tools where users can attach files for analysis

Requires

Python 3.9+

llm package with attachment support

model that supports the desired attachment type (image, PDF, audio, etc.)

Limitations

Not all models support all attachment types; e.g., local models may not support images

Large attachments increase API costs and latency; no built-in compression or optimization

Attachment handling is provider-specific; the same attachment may be processed differently by OpenAI vs Anthropic

What makes it unique

Uses a Fragments abstraction to represent different media types uniformly, allowing the same Prompt class to handle text, images, audio, and files without conditional logic. Attachments are persisted to the conversation log, making multi-modal conversation history queryable and reproducible.

vs alternatives

More unified than OpenAI's API because it abstracts away provider-specific attachment formats, and more persistent than Anthropic's approach because attachments are logged to the database for future reference.

embedding generation and semantic search with vector storage

Medium confidence

Provides an EmbeddingModel base class for generating vector embeddings from text. The embedding system stores vectors in a separate SQLite database (embeddings.db) with associated metadata, enabling semantic search and similarity operations. Plugins can provide embedding models (e.g., OpenAI's text-embedding-3-small, local models via Ollama). The embed() and embed_batch() methods support both single and bulk embedding generation.

Solves for

I want to generate embeddings for a corpus of documents and search them semanticallyI need to find similar documents or prompts based on semantic similarity, not keyword matchingI want to use embeddings for clustering, classification, or recommendation tasks

Best for

teams building RAG systems with semantic search

applications requiring similarity-based document retrieval

developers implementing semantic clustering or recommendation engines

Requires

Python 3.9+

llm package with embedding support

embedding model (OpenAI, local via Ollama, or plugin-provided)

Limitations

Embedding quality depends on the model; different models produce incompatible vector spaces

Vector storage in SQLite is not optimized for high-dimensional search; use a vector database (Pinecone, Weaviate) for production scale

Embedding generation costs tokens/money with cloud providers; bulk operations can be expensive

What makes it unique

Separates embedding storage from conversation logs (embeddings.db vs logs.db), allowing independent scaling and querying of embeddings. EmbeddingModel abstraction enables swapping embedding providers without changing application code, and batch operations optimize cost for bulk embedding generation.

vs alternatives

More integrated than using OpenAI's API directly because it provides a unified interface across embedding models and handles storage, and simpler than LangChain's embedding system because it doesn't require external vector databases for basic use cases.

interactive cli chat with streaming responses

Medium confidence

Provides an interactive chat interface via the llm chat command that maintains conversation state, handles multi-turn interactions, and streams model responses to the terminal in real-time. The CLI uses the Conversation class to manage history and the AsyncModel interface for non-blocking streaming. Responses are displayed incrementally as tokens arrive, improving perceived latency and enabling early interruption.

Solves for

I want to have a multi-turn conversation with an LLM in the terminal without writing codeI need to see model responses stream in real-time instead of waiting for the full responseI want to resume a previous conversation by loading it from the conversation history

Best for

developers and non-technical users interacting with LLMs via CLI

rapid prototyping and testing of prompts

interactive debugging and exploration of model behavior

Requires

Python 3.9+

llm package installed and configured with at least one model

API key for the model provider (OpenAI, Anthropic, etc.)

Limitations

Terminal-based interface limits formatting and media display; images and complex layouts are not well-supported

Streaming adds complexity to error handling; partial responses may be displayed if the model fails mid-stream

No built-in syntax highlighting or code formatting in responses

What makes it unique

Uses async/await with streaming iterators to display responses incrementally without blocking the terminal, and integrates conversation persistence directly into the CLI so history is automatically saved without explicit commands.

vs alternatives

More responsive than ChatGPT's web interface for power users because responses stream immediately, and more portable than Anthropic's console because it's a local CLI with no external dependencies.

prompt templating with variable substitution and reusability

Medium confidence

Supports defining reusable prompt templates with variable placeholders that can be instantiated with different values. Templates are stored as files or registered via the plugin system, and can include system prompts, tools, and schemas. The template system uses simple string substitution (e.g., {variable_name}) to inject values at runtime, enabling prompt reuse across different contexts without code duplication.

Solves for

I want to define a prompt template once and reuse it for different inputs without copying and pastingI need to share prompt templates across my team so everyone uses the same prompt structureI want to version control and iterate on prompts separately from application code

Best for

teams managing multiple prompts and wanting to avoid duplication

applications with domain-specific prompt patterns (e.g., summarization, classification)

prompt engineering workflows where templates are iterated and versioned

Requires

Python 3.9+

llm package with template support

template files in ~/.llm/templates/ or registered via plugin entry points

Limitations

Template system uses simple string substitution, not a full templating engine (e.g., Jinja2); complex logic requires custom code

No built-in variable validation; missing or incorrect variables fail at runtime

Templates are stored as files or in the plugin system; no built-in UI for managing templates

What makes it unique

Templates are first-class citizens in the plugin system, allowing teams to distribute and share prompt templates as packages. Templates can include not just text but also system prompts, tools, and schemas, making them more powerful than simple string templates.

vs alternatives

Simpler than LangChain's prompt templates because it doesn't require a full templating engine, and more discoverable than storing prompts in code because templates are stored as files and registered via entry points.

model aliasing and configuration management

Medium confidence

Allows users to define aliases for models (e.g., 'default' -> 'gpt-4-turbo') and configure model parameters (temperature, max_tokens, system prompts) via configuration files in ~/.llm/. The configuration system supports per-model settings and global defaults, enabling users to customize model behavior without modifying code. Aliases are stored in a configuration file and loaded at CLI startup.

Solves for

I want to set a default model so I don't have to specify --model every timeI need to configure different temperature settings for different use cases (e.g., creative vs analytical)I want to define a system prompt that applies to all conversations with a specific model

Best for

CLI users wanting to customize model behavior without code changes

teams standardizing on specific model configurations across projects

developers managing multiple model aliases for A/B testing

Requires

Python 3.9+

llm package installed

configuration files in ~/.llm/ directory

Limitations

Configuration is stored in plain text files; no encryption for sensitive settings like API keys

No built-in validation of configuration values; invalid settings fail at runtime

Configuration changes require CLI restart to take effect; no hot-reloading

What makes it unique

Configuration is stored in user-friendly files (not code) and loaded at startup, allowing non-technical users to customize model behavior. Aliases enable switching between models without changing prompts or code, supporting A/B testing and gradual migration between providers.

vs alternatives

More user-friendly than environment variables because configuration is discoverable and editable in files, and more flexible than hardcoded defaults because aliases can be changed without redeploying code.

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Related Artifactssharing capabilities

Artifacts that share capabilities with llm (Simon Willison), ranked by overlap. Discovered automatically through the match graph.

CLI Tool25

LLM

A CLI utility and Python library for interacting with Large Language Models, remote and local. [#opensource](https://github.com/simonw/llm)

python library api for programmatic llm accessconversation history management with multi-turn context

2 shared capabilities

CLI Tool57

llm

CLI tool for interacting with LLMs.

provider-agnostic model abstraction with unified interfacemulti-turn conversation state management with sqlite persistence

2 shared capabilities

CLI Tool58

gptme

Personal AI assistant in terminal — code execution, file manipulation, web browsing, self-correcting.

multi-provider llm conversation management with persistent state

1 shared capability

App36

chatbox

Powerful AI Client

multi-provider llm abstraction with unified api

1 shared capability

Framework59

Lobe Chat

Modern ChatGPT UI framework — 100+ providers, multimodal, plugins, RAG, Vercel deploy.

multi-provider llm abstraction with unified api

1 shared capability

Agent46

autogen

A programming framework for agentic AI

llm client abstraction with multi-provider support

1 shared capability

Best For

✓developers building multi-provider LLM applications
✓teams wanting to avoid vendor lock-in to a single model provider
✓plugin developers extending llm with custom models
✓applications requiring audit trails and compliance logging
✓interactive CLI tools and chatbots with session management
✓teams analyzing LLM usage patterns and optimizing costs
✓Python developers building LLM-powered applications
✓teams integrating LLMs into existing Python codebases

Known Limitations

⚠Abstractions add minimal overhead but require all models to implement the full interface even if some methods are no-ops
⚠Provider-specific features (e.g., vision capabilities, function calling schemas) must be normalized to a common interface, potentially losing nuanced control
⚠Async/sync duality requires maintaining two code paths, increasing maintenance burden
⚠SQLite is single-writer, so high-concurrency scenarios (many simultaneous conversations) may experience lock contention
⚠Database grows unbounded without manual pruning; no built-in retention policies or archival
⚠Logging happens synchronously by default, adding latency to each LLM call (typically <10ms for local SQLite writes)

Requirements

Python 3.9+llm package installedAPI keys or local model setup for at least one providerSQLite3 (included in Python stdlib)Write permissions to ~/.llm/ directory (default logs location)llm package with database schema initializedAPI keys for model providersasyncio knowledge for async usage (optional but recommended)

Input / Output

Accepts: text prompts, system prompts, attachments (images, audio, files), tool definitions, schema specifications, prompt text, model name and parameters, attachments metadata, model names (strings), prompts (text or Prompt objects), tools and schemas, list of text strings, metadata associated with each text, model instance, Python class definitions (Model, Tool, Template subclasses), entry_points metadata in pyproject.toml, Python function definitions with type hints, JSON schema specifications for tool parameters, tool descriptions (docstrings), JSON schema definitions (dict or JSON string), image files (PNG, JPEG, GIF, WebP), PDF documents, audio files (for some models), raw bytes or file paths, text strings, lists of text for batch embedding, metadata associated with embeddings, text prompts typed in the terminal, system prompts via --system flag, model selection via --model flag, template files (text with {variable} placeholders), variable values (strings, lists, dicts), configuration files (YAML, TOML, or JSON format), command-line flags (--model, --temperature, etc.)

Produces: text responses, structured JSON (via schema), usage statistics (tokens, cost), async iterators for streaming, SQLite database rows, Conversation objects with full history, JSON export of conversations, usage statistics (tokens, cost, latency), Response objects with text(), json(), usage() methods, AsyncResponse objects for async operations, Conversation objects for multi-turn interactions, list of embedding vectors, cost estimates (in dollars), token usage statistics, boolean flags for supported features (vision, function_calling, streaming, etc.), metadata about model (max_tokens, context_window, etc.), registered model instances available via llm.get_model(), tool callables available to models, template strings available via llm templates command, tool invocation results (any JSON-serializable type), tool schemas compatible with OpenAI/Anthropic APIs, model responses that reference tool calls, JSON-parsed Python objects (dict, list, etc.), validated against provided schema, raw model text if validation fails, model responses analyzing the attachment, attachment metadata in conversation logs, serialized attachment references in database, vector embeddings (lists of floats), stored in embeddings.db with metadata, queryable for similarity search, streamed text responses in the terminal, conversation history saved to logs.db, usage statistics (tokens, cost) displayed after each response, instantiated prompts with variables substituted, full Prompt objects with system prompts, tools, and schemas, model instances with configured parameters, default settings applied to all prompts

UnfragileRank

Adoption70%(25% weight)

Quality90%(25% weight)

Ecosystem40%(10% weight)

Match Graph25%(35% weight)

Freshness100%(5% weight)

UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.

Type: CLI Tool

13 capabilities

Visit llm (Simon Willison)→

About

CLI tool and Python library for interacting with LLMs. Supports OpenAI, Anthropic, local models via plugins. Features conversation history, templates, embeddings, and a plugin ecosystem. By the creator of Datasette.

Alternatives to llm (Simon Willison)

Claude Code79Agent

Anthropic's terminal coding agent — file ops, git, MCP servers, extended thinking, slash commands.

Compare →

Codex CLI75CLI Tool

OpenAI's terminal coding agent — file editing, command execution, sandboxed, multi-file support.

Compare →

aider73CLI Tool

AI pair programming in terminal — git-aware, multi-file editing, auto-commits, voice coding.

Compare →

Filesystem MCP Server60MCP Server

Read, write, and manage local filesystem resources via MCP.

Compare →

Are you the builder of llm (Simon Willison)?

Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.

Claim this artifact →Verification via email

Get the weekly brief

New tools, rising stars, and what's actually worth your time. No spam.

Data Sources

seed developer essentials

Looking for something else?

Search →

Capabilities13 decomposed

provider-agnostic model abstraction with unified interface

Medium confidence

Solves for

Best for

developers building multi-provider LLM applications

teams wanting to avoid vendor lock-in to a single model provider

plugin developers extending llm with custom models

Requires

Python 3.9+

llm package installed

API keys or local model setup for at least one provider

Limitations

Abstractions add minimal overhead but require all models to implement the full interface even if some methods are no-ops

Provider-specific features (e.g., vision capabilities, function calling schemas) must be normalized to a common interface, potentially losing nuanced control

Async/sync duality requires maintaining two code paths, increasing maintenance burden

What makes it unique

vs alternatives

persistent conversation history with sqlite logging

Medium confidence

Solves for

Best for

applications requiring audit trails and compliance logging

interactive CLI tools and chatbots with session management

teams analyzing LLM usage patterns and optimizing costs

Requires

Python 3.9+

SQLite3 (included in Python stdlib)

Write permissions to ~/.llm/ directory (default logs location)

Limitations

SQLite is single-writer, so high-concurrency scenarios (many simultaneous conversations) may experience lock contention

Database grows unbounded without manual pruning; no built-in retention policies or archival

Logging happens synchronously by default, adding latency to each LLM call (typically <10ms for local SQLite writes)

What makes it unique

vs alternatives

python api for programmatic llm access

Medium confidence

Solves for

Best for

Python developers building LLM-powered applications

teams integrating LLMs into existing Python codebases

developers needing async support for high-concurrency scenarios

Requires

Python 3.9+

llm package installed

API keys for model providers

Limitations

Python-only; no official SDKs for other languages

Async API requires understanding of Python's asyncio; synchronous API is simpler but blocks on I/O

Error handling is minimal; API errors are raised as exceptions without built-in retry logic

What makes it unique

vs alternatives

Simpler than OpenAI's SDK because it abstracts away provider-specific details, and more flexible than Anthropic's SDK because it supports multiple providers and async natively.

batch embedding and cost estimation

Medium confidence

Solves for

Best for

teams building RAG systems with large document collections

applications requiring bulk embedding generation

cost-conscious teams optimizing LLM spending

Requires

Python 3.9+

llm package with embedding support

embedding model with batch support

Limitations

Batch operations are limited by API rate limits; very large batches may need to be split

Cost estimation is approximate and depends on accurate pricing data; actual costs may vary

No built-in parallelization; caller must use threading/multiprocessing for concurrent batches

What makes it unique

vs alternatives

More efficient than calling embed() in a loop because it batches API calls, and more transparent than cloud provider dashboards because cost estimates are available programmatically.

model capability introspection and feature detection

Medium confidence

Solves for

Best for

applications supporting multiple models with varying capabilities

developers building adaptive systems that degrade gracefully

teams implementing feature flags based on model capabilities

Requires

Python 3.9+

llm package with capability introspection support

model instance

Limitations

Capability information must be manually maintained for each model; no automatic detection

Capabilities may change as models are updated; cached capability info can become stale

No standardized capability schema; different providers use different terminology

What makes it unique

vs alternatives

More flexible than hardcoding capabilities because they can be queried at runtime, and more reliable than trying features and catching exceptions because capabilities are known upfront.

plugin-based model and tool discovery with entry points

Medium confidence

Solves for

Best for

plugin developers extending llm with custom models or tools

teams building internal LLM tools and wanting to distribute them as packages

open-source contributors adding new provider support

Requires

Python 3.9+

setuptools with entry_points support

llm package installed in the same Python environment

Limitations

Entry point discovery happens at CLI startup, so new plugins require reinstalling/restarting the CLI

No built-in versioning or compatibility checking; breaking changes in plugin APIs can silently fail

Plugin isolation is minimal—malicious or buggy plugins can crash the entire llm process

What makes it unique

vs alternatives

tool execution and function calling with schema validation

Medium confidence

Solves for

Best for

developers building LLM agents with external tool access

teams implementing retrieval-augmented generation (RAG) with tool-based document fetching

applications requiring LLM-driven automation with guardrails (schema validation)

Requires

Python 3.9+

llm package with Tool and Toolbox classes

model that supports function calling (OpenAI, Anthropic, or plugin-provided)

Limitations

Tool execution is synchronous by default; long-running tools block the LLM response loop

Schema validation is optional—if no schema is provided, the model can pass arbitrary arguments

No built-in timeout or resource limits on tool execution; runaway functions can hang the process

What makes it unique

vs alternatives

structured output generation with json schema enforcement

Medium confidence

Solves for

Best for

applications requiring reliable data extraction from LLM outputs

teams building LLM-powered APIs that need to return structured responses

systems where downstream processing depends on consistent JSON structure

Requires

Python 3.9+

llm package with schema support

model that supports JSON mode or schema validation (OpenAI, Anthropic, or plugin-provided)

Limitations

Not all models support schema enforcement; some only validate client-side after generation, wasting tokens

Complex schemas may confuse models or cause generation failures; simpler schemas are more reliable

Schema enforcement adds latency (validation overhead) and may reduce output quality if the schema is too restrictive

What makes it unique

vs alternatives

multi-modal input handling with attachments and fragments

Medium confidence

Solves for

Best for

applications requiring vision capabilities (image analysis, OCR, diagram understanding)

document processing pipelines that feed PDFs or images to LLMs

interactive tools where users can attach files for analysis

Requires

Python 3.9+

llm package with attachment support

model that supports the desired attachment type (image, PDF, audio, etc.)

Limitations

Not all models support all attachment types; e.g., local models may not support images

Large attachments increase API costs and latency; no built-in compression or optimization

Attachment handling is provider-specific; the same attachment may be processed differently by OpenAI vs Anthropic

What makes it unique

vs alternatives

embedding generation and semantic search with vector storage

Medium confidence

Solves for

Best for

teams building RAG systems with semantic search

applications requiring similarity-based document retrieval

developers implementing semantic clustering or recommendation engines

Requires

Python 3.9+

llm package with embedding support

embedding model (OpenAI, local via Ollama, or plugin-provided)

Limitations

Embedding quality depends on the model; different models produce incompatible vector spaces

Vector storage in SQLite is not optimized for high-dimensional search; use a vector database (Pinecone, Weaviate) for production scale

Embedding generation costs tokens/money with cloud providers; bulk operations can be expensive

What makes it unique

vs alternatives

interactive cli chat with streaming responses

Medium confidence

Solves for

Best for

developers and non-technical users interacting with LLMs via CLI

rapid prototyping and testing of prompts

interactive debugging and exploration of model behavior

Requires

Python 3.9+

llm package installed and configured with at least one model

API key for the model provider (OpenAI, Anthropic, etc.)

Limitations

Terminal-based interface limits formatting and media display; images and complex layouts are not well-supported

Streaming adds complexity to error handling; partial responses may be displayed if the model fails mid-stream

No built-in syntax highlighting or code formatting in responses

What makes it unique

vs alternatives

More responsive than ChatGPT's web interface for power users because responses stream immediately, and more portable than Anthropic's console because it's a local CLI with no external dependencies.

prompt templating with variable substitution and reusability

Medium confidence

Solves for

Best for

teams managing multiple prompts and wanting to avoid duplication

applications with domain-specific prompt patterns (e.g., summarization, classification)

prompt engineering workflows where templates are iterated and versioned

Requires

Python 3.9+

llm package with template support

template files in ~/.llm/templates/ or registered via plugin entry points

Limitations

Template system uses simple string substitution, not a full templating engine (e.g., Jinja2); complex logic requires custom code

No built-in variable validation; missing or incorrect variables fail at runtime

Templates are stored as files or in the plugin system; no built-in UI for managing templates

What makes it unique

vs alternatives

model aliasing and configuration management

Medium confidence

Solves for

Best for

CLI users wanting to customize model behavior without code changes

teams standardizing on specific model configurations across projects

developers managing multiple model aliases for A/B testing

Requires

Python 3.9+

llm package installed

configuration files in ~/.llm/ directory

Limitations

Configuration is stored in plain text files; no encryption for sensitive settings like API keys

No built-in validation of configuration values; invalid settings fail at runtime

Configuration changes require CLI restart to take effect; no hot-reloading

What makes it unique

vs alternatives

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Alternatives to llm (Simon Willison)

Claude Code79Agent

Anthropic's terminal coding agent — file ops, git, MCP servers, extended thinking, slash commands.

Compare →

Codex CLI75CLI Tool

OpenAI's terminal coding agent — file editing, command execution, sandboxed, multi-file support.

Compare →

aider73CLI Tool

AI pair programming in terminal — git-aware, multi-file editing, auto-commits, voice coding.

Compare →

Filesystem MCP Server60MCP Server

Read, write, and manage local filesystem resources via MCP.

Compare →

llm (Simon Willison)

Capabilities13 decomposed

provider-agnostic model abstraction with unified interface

persistent conversation history with sqlite logging

python api for programmatic llm access

batch embedding and cost estimation

model capability introspection and feature detection

plugin-based model and tool discovery with entry points

tool execution and function calling with schema validation

structured output generation with json schema enforcement

multi-modal input handling with attachments and fragments

embedding generation and semantic search with vector storage

interactive cli chat with streaming responses

prompt templating with variable substitution and reusability

model aliasing and configuration management

Related Artifactssharing capabilities

LLM

llm

gptme

chatbox

Lobe Chat

autogen

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

About

Categories

Alternatives to llm (Simon Willison)

Are you the builder of llm (Simon Willison)?

Get the weekly brief

Data Sources

llm (Simon Willison)

Capabilities13 decomposed

provider-agnostic model abstraction with unified interface

persistent conversation history with sqlite logging

python api for programmatic llm access

batch embedding and cost estimation

model capability introspection and feature detection

plugin-based model and tool discovery with entry points

tool execution and function calling with schema validation

structured output generation with json schema enforcement

multi-modal input handling with attachments and fragments

embedding generation and semantic search with vector storage

interactive cli chat with streaming responses

prompt templating with variable substitution and reusability

model aliasing and configuration management

Related Artifactssharing capabilities

LLM

llm

gptme

chatbox

Lobe Chat

autogen

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

About

Categories

Alternatives to llm (Simon Willison)

Are you the builder of llm (Simon Willison)?

Get the weekly brief

Data Sources