multi-role agent orchestration with software company simulation
MetaGPT assigns distinct LLM-powered roles (Product Manager, Architect, Engineer, QA) to collaborate as a simulated software company. Each role executes domain-specific actions sequentially, with message passing between roles enabling task decomposition and workflow coordination. The framework uses a Role base class with action queues and memory systems to maintain role-specific context across multi-turn interactions, simulating realistic software development workflows where roles depend on outputs from upstream roles.
Unique: Uses a Role-Action-Message architecture where roles are stateful agents with persistent memory, action queues, and message-based communication. Unlike simple function-calling agents, each role maintains its own context and can iterate on tasks. The framework includes pre-built roles (Engineer, ProductManager, Architect, QA) with domain-specific prompts and ActionNode definitions that structure outputs for downstream consumption.
vs alternatives: Differs from AutoGPT/BabyAGI by providing explicit role specialization and structured workflows rather than generic task decomposition, enabling more predictable multi-agent collaboration patterns similar to real software teams.
actionnode-based structured output generation with dynamic validation
ActionNode is a declarative system for defining LLM output schemas with automatic prompt generation, parsing, and validation. Each ActionNode specifies expected output fields with types, descriptions, and validation rules. MetaGPT generates prompts that guide the LLM to produce structured outputs (JSON, code, markdown), then parses and validates responses against the schema. If validation fails, the system can trigger automatic revision loops where the LLM corrects its output based on validation errors.
Unique: Implements a declarative schema system where output structure is defined once and reused for prompt generation, parsing, and validation. Uses Pydantic models to define schemas, automatically generates prompts that teach the LLM the expected format, and includes a revision system that feeds validation errors back to the LLM for self-correction. This is more sophisticated than simple regex parsing or JSON extraction.
vs alternatives: More robust than manual prompt engineering + regex parsing because it couples schema definition with validation and automatic retry logic, reducing the need for brittle post-processing code.
mock llm and response caching for testing and development
MetaGPT includes a MockLLM class that simulates LLM responses for testing without making actual API calls. The system also implements response caching where real LLM responses are cached and replayed in subsequent runs. This enables fast iteration during development and reproducible testing. Cache is stored in JSON files and can be versioned with git.
Unique: Provides both MockLLM for simulated responses and response caching for real LLM calls. Caches are stored in JSON files that can be version-controlled, enabling reproducible tests. The system can switch between mock and real LLMs without code changes.
vs alternatives: More comprehensive than simple mocking because it combines mock responses with real response caching, enabling both fast development and reproducible testing.
context serialization and recovery for workflow persistence
MetaGPT supports serializing the entire execution context (roles, messages, artifacts, configuration) to enable workflow resumption from checkpoints. The Context class manages runtime state and can be serialized to JSON or other formats. This enables long-running workflows to be paused and resumed, or migrated across systems. Context recovery reconstructs the full agent state including memory and message history.
Unique: Serializes the entire execution context including roles, messages, artifacts, and configuration, enabling complete workflow recovery. Context snapshots can be stored and recovered, supporting both pause-resume and cross-system migration.
vs alternatives: More comprehensive than simple state saving because it captures the full execution context including message history and agent memory, not just final outputs.
function calling with schema-based tool integration across multiple llm providers
MetaGPT implements a schema-based function calling system where tools are defined with Pydantic models or JSON schemas, and the framework translates these to provider-specific function calling formats (OpenAI, Anthropic, etc.). The system handles function call parsing, validation, and execution. Tools can be registered globally or per-role, and the framework manages the function calling loop (LLM calls function → execute → return result → LLM continues).
Unique: Implements a provider-agnostic function calling system where tools are defined once using Pydantic schemas and automatically translated to each provider's format. The framework handles the function calling loop and manages provider-specific quirks (e.g., OpenAI's tool_choice parameter, Anthropic's tool_use blocks).
vs alternatives: More robust than manual function calling because it abstracts provider differences and includes automatic validation and error handling, reducing the need for provider-specific code.
multi-modal capabilities with image input and vision model support
MetaGPT supports multi-modal inputs including images and vision models. Agents can process images, extract information, and generate descriptions or code based on visual content. The framework integrates vision capabilities with the standard LLM provider system, enabling agents to analyze screenshots, diagrams, or other visual artifacts. Vision model responses are integrated into the message stream and can be used by downstream agents.
Unique: Integrates vision model support into the standard LLM provider system, enabling agents to process images alongside text. Vision responses are treated as regular messages and can be consumed by downstream agents, enabling workflows that combine visual and textual reasoning.
vs alternatives: More integrated than separate vision APIs because vision capabilities are built into the agent framework, enabling seamless multi-modal workflows without additional orchestration.
projectrepo-based artifact management with git integration
ProjectRepo is a file system abstraction that manages code artifacts, design documents, and project metadata with automatic git integration. It provides methods to write files, commit changes, and maintain project structure. The system tracks file modifications, enables incremental development by reading previous outputs, and integrates with git for version control. Artifacts are organized by type (code, docs, tests) and can be retrieved for downstream processing or review.
Unique: Provides a high-level abstraction over git operations (write, commit, read) that agents can use without directly invoking git commands. Maintains a mapping of file types to directories and enables agents to query the project structure. Includes methods for reading previous artifacts to support incremental development where agents build on prior outputs.
vs alternatives: Simpler than agents directly calling git CLI because it abstracts away git complexity and provides semantic methods (write_code, write_doc) that are easier for LLMs to use correctly.
llm provider abstraction with multi-provider support and token management
MetaGPT implements a BaseLLM abstract class with concrete implementations for OpenAI, Anthropic, Azure, AWS Bedrock, and OpenAI-compatible providers (Ollama, vLLM). The system includes a provider registry that routes requests to the appropriate LLM backend based on configuration. Token counting and cost tracking are built-in, with support for streaming responses and function calling across different provider APIs. Configuration is centralized and can be overridden per-request.
Unique: Implements a provider registry pattern where each LLM provider (OpenAI, Anthropic, Bedrock, etc.) is a concrete implementation of BaseLLM. The framework handles provider-specific API differences transparently, including function calling schema translation and streaming response handling. Token counting is integrated per-provider with cost calculation.
vs alternatives: More comprehensive than LiteLLM because it includes token counting, cost tracking, and streaming support natively, plus tight integration with the multi-agent framework for role-specific provider selection.
+6 more capabilities