designing-real-world-ai-agents-workshop
MCP ServerFreeHands-on workshop: Build a multi-agent AI system from scratch — Deep Research Agent + Writing Workflow served as MCP servers. Includes code, slides, and video
Capabilities12 decomposed
gemini-grounded iterative research with google search integration
Medium confidenceExecutes multi-turn research workflows using Google Gemini API with built-in Google Search grounding to retrieve factual, up-to-date information. The Deep Research Agent (src/research/server.py) implements a tool-use pattern where Gemini can invoke search tools iteratively, refining queries based on intermediate results, and persists findings into a structured research.md file. Supports YouTube transcript extraction when URLs are provided, enabling multi-modal source integration.
Uses Gemini's native Google Search grounding (not external RAG) combined with tool-use agents for iterative query refinement, eliminating hallucination risk while maintaining real-time information access. YouTube transcript extraction is built-in, enabling multi-modal research without separate API calls.
Faster and more accurate than RAG-based research systems because it queries live search results directly rather than relying on static embeddings, and cheaper than multi-step LLM chains because grounding is native to Gemini's API.
mcp-based multi-agent orchestration with decoupled server architecture
Medium confidenceImplements a two-server MCP architecture (Deep Research Agent + LinkedIn Writer Agent) using FastMCP framework, where each server exposes tools, resources, and prompts independently and communicates through standardized MCP protocol. The architecture decouples research and writing concerns, allowing each agent to be developed, tested, and scaled independently while maintaining a unified interface. Configuration is managed via .mcp.json and environment variables, enabling runtime server discovery and tool registration.
Uses FastMCP framework to expose agents as standardized MCP servers rather than monolithic functions, enabling true decoupling where each agent (research, writing) has its own process, configuration, and tool registry. This pattern allows IDE integration (Claude Code, Cursor) without custom client code.
More modular and testable than LangChain agent chains because each agent is independently deployable and has explicit tool/resource contracts, and more flexible than REST-based agent APIs because MCP provides native IDE integration without custom UI.
configuration management with environment variables and pydantic settings
Medium confidenceCentralizes configuration using Pydantic Settings models (src/research/config/, src/writing/config/) that load from environment variables and .env files, enabling environment-specific configuration without code changes. Configuration includes API keys, model parameters, evaluation thresholds, and server endpoints. Pydantic validation ensures type safety and provides helpful error messages for missing or invalid configuration.
Uses Pydantic Settings for type-safe, validated configuration with automatic environment variable loading. Configuration is centralized in dedicated config modules (src/research/config/, src/writing/config/), making it easy to add new configuration options without modifying agent code.
More robust than manual environment variable parsing because Pydantic validates types and provides helpful error messages, and more maintainable than hardcoded configuration because all settings are in one place.
structured research persistence and markdown-based knowledge representation
Medium confidencePersists research findings to a structured markdown file (research.md) that serves as the knowledge base for the writing agent. The markdown format enables human readability while maintaining machine-parseable structure (headings, lists, citations). Research findings include source citations, timestamps, and iterative search history, creating an auditable record of how conclusions were reached. The writing agent reads this markdown to generate content, ensuring factual grounding.
Uses markdown as the primary knowledge representation format, enabling both machine parsing (for writing agent) and human inspection (for manual review). Includes source citations and search history, creating an auditable record of research methodology.
More transparent than vector databases because research is human-readable and manually editable, and more flexible than structured databases because markdown can accommodate unstructured notes and citations.
evaluator-optimizer loop for iterative content refinement
Medium confidenceImplements a multi-iteration content generation and evaluation pattern in the LinkedIn Writer Agent (src/writing/server.py) where an LLM generates initial content, an evaluator (LLM-as-judge) scores it against quality criteria, and an optimizer refines it based on feedback. The loop continues until quality thresholds are met or max iterations reached. Uses Opik for tracing and LLM-based evaluation metrics, enabling observable, measurable content quality improvement without human-in-the-loop.
Combines LLM-as-judge evaluation with iterative optimization in a closed loop, using Opik for full observability of each refinement cycle. Unlike simple prompt engineering, this pattern measures quality objectively and refines based on measurable feedback, not heuristics.
More reliable than single-pass LLM generation because it validates and refines output against explicit criteria, and more transparent than black-box content APIs because every iteration is traced and evaluated metrics are visible.
ai image generation with gemini imagen integration
Medium confidenceIntegrates Google Gemini's Imagen model for AI-generated images within the writing workflow, enabling automatic image creation to accompany generated LinkedIn posts. The image generation is triggered based on post content and writing profiles, with generated images persisted to the dataset directory. Supports prompt engineering for image generation based on post themes and audience preferences.
Integrates Imagen directly into the writing workflow as a native step, not a separate tool — image generation is triggered automatically based on post content and writing profiles, enabling end-to-end content creation without manual image selection.
More integrated than using external image APIs (DALL-E, Midjourney) because it's part of the same Gemini API ecosystem and can reference post content directly, and faster than manual image selection because generation is automated and parallelizable.
dataset-driven evaluation with llm-as-judge metrics
Medium confidenceImplements a structured dataset system (datasets/ directory) with batch evaluation scripts that process multiple content samples through the writing workflow and score them using LLM-as-judge metrics via Opik. The evaluation system measures quality across dimensions (clarity, engagement, relevance) and aggregates results for statistical analysis. Supports dataset versioning and comparison across model versions or writing profiles.
Combines structured dataset management with Opik-based LLM-as-judge evaluation, enabling systematic quality measurement across multiple samples with full traceability. Unlike ad-hoc evaluation, this pattern produces reproducible, comparable metrics across writing profiles and model versions.
More rigorous than manual spot-checking because it evaluates entire datasets systematically, and more transparent than black-box quality scores because each evaluation is traced in Opik with full iteration history visible.
mcp tool and resource definition with schema-based routing
Medium confidenceDefines MCP tools and resources using FastMCP decorators (@mcp.tool, @mcp.resource) with JSON schema validation, enabling type-safe tool invocation and automatic schema generation. The research and writing servers expose distinct tool sets (search, research persistence, content generation, evaluation) with Pydantic-based input/output validation. MCP routers (src/research/routers/, src/writing/routers/) map tool invocations to application logic, decoupling tool definitions from implementation.
Uses FastMCP decorators with Pydantic models to automatically generate MCP tool schemas, eliminating manual JSON schema writing. Router pattern (src/research/routers/, src/writing/routers/) decouples tool definitions from implementation, enabling easy tool addition without modifying server core.
More maintainable than hand-written JSON schemas because Pydantic models are single source of truth, and more discoverable than REST APIs because MCP clients can introspect tool schemas at runtime without documentation.
prompt template system with writing profiles and context injection
Medium confidenceImplements a prompt template system (src/writing/profiles/) where writing profiles define tone, style, audience, and quality criteria as structured data, and prompt templates inject these profiles into system/user messages. The system uses Jinja2-style templating (or similar) to dynamically construct prompts based on profile attributes and research content. Profiles are versioned and can be A/B tested to measure impact on content quality.
Separates writing profiles (data) from prompt templates (logic), enabling non-technical users to create new writing styles by editing profile files without touching prompt code. Profiles are versioned and A/B testable, making it easy to measure impact of style changes on content quality.
More flexible than hard-coded prompts because profiles can be changed without code deployment, and more systematic than ad-hoc prompt engineering because profiles are versioned and evaluated quantitatively.
end-to-end workflow orchestration from research to published content
Medium confidenceOrchestrates a complete workflow (src/research/server.py → src/writing/server.py) where research findings are automatically fed into the writing agent, which generates, evaluates, and refines content, then generates accompanying images. The workflow is exposed as a high-level skill (.claude/skills/research-and-write/SKILL.md) that can be invoked from Claude Code or Cursor with a single topic input. Workflow state is persisted to the filesystem (research.md, generated posts, images), enabling resumption and inspection at any stage.
Exposes the entire research-to-content pipeline as a single Claude Code skill, enabling non-technical users to run complex multi-agent workflows without understanding MCP or agent architecture. Filesystem-based state persistence allows inspection and manual intervention at any stage.
More complete than individual agent tools because it handles the full pipeline (research + writing + evaluation + images), and more accessible than custom orchestration code because it's exposed as a Claude Code skill with natural language invocation.
observability and tracing with opik integration
Medium confidenceIntegrates Opik for end-to-end tracing of agent workflows, capturing every LLM call, tool invocation, and evaluation metric. Opik traces are automatically generated for research iterations, content generation cycles, and evaluation steps, with links persisted in output metadata. The system enables post-hoc analysis of agent behavior, debugging of failed workflows, and measurement of cost/latency across workflow stages.
Provides native Opik integration throughout the codebase, automatically capturing traces for research iterations, content generation, and evaluation without manual instrumentation. Opik traces include LLM-as-judge evaluation metrics, enabling measurement of content quality alongside cost and latency.
More comprehensive than print-based debugging because it captures full trace context (model, parameters, latency, cost), and more actionable than generic LLM monitoring because it includes domain-specific metrics (evaluation scores, iteration counts).
workflow test scripts and batch processing automation
Medium confidenceProvides Python scripts (scripts/test_research_workflow.py, batch dataset processing scripts) that automate end-to-end testing and evaluation of the multi-agent system. Scripts handle dataset loading, workflow invocation, result collection, and metric aggregation. Uses GNU Make (Makefile) for task orchestration, enabling developers to run complex workflows with simple commands (e.g., `make test-research`, `make evaluate-dataset`).
Combines Python scripts with Makefile-based task orchestration, enabling both programmatic control (for CI/CD) and simple command-line invocation (for developers). Scripts handle full workflow automation including dataset loading, result collection, and metric aggregation.
More accessible than custom Python orchestration because Make commands are simple and discoverable, and more flexible than hardcoded test suites because scripts are parameterized for different datasets and profiles.
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with designing-real-world-ai-agents-workshop, ranked by overlap. Discovered automatically through the match graph.
Google Gemini API
Google's multimodal API — Gemini 2.5 Pro/Flash, 1M context, video understanding, grounding.
gemini-mcp-tool
MCP server that enables AI assistants to interact with Google Gemini CLI, leveraging Gemini's massive token window for large file analysis and codebase understanding
DeepView MCP
** - Enables IDEs like Cursor and Windsurf to analyze large codebases using Gemini's 1M context window.
Gemini 2.5 Pro
Google's most capable model with 1M context and native thinking.
ai.google.dev
|[URL](https://gemini.google.com/) <br> |Free/Paid|
gemini-flow
rUv's Claude-Flow, translated to the new Gemini CLI; transforming it into an autonomous AI development team.
Best For
- ✓Teams building content pipelines requiring factual accuracy (journalism, technical writing, marketing)
- ✓Developers implementing agentic systems that need grounded search without external RAG infrastructure
- ✓Organizations automating research-to-content workflows at scale
- ✓Teams building complex agentic workflows with multiple specialized agents
- ✓Developers migrating from monolithic LLM applications to modular, composable architectures
- ✓Organizations standardizing on MCP for AI tool integration across multiple products
- ✓Teams deploying agents across multiple environments (dev, staging, prod)
- ✓Developers managing sensitive configuration (API keys) securely
Known Limitations
- ⚠Requires Google Gemini API key and active Google Search grounding subscription — not free tier compatible
- ⚠YouTube transcript extraction limited to publicly available transcripts; no support for age-restricted or private videos
- ⚠Research depth constrained by Gemini context window (~32k tokens) — very large research topics may require chunking
- ⚠No built-in deduplication of search results across iterations — may retrieve redundant information
- ⚠MCP protocol overhead adds ~50-100ms per tool invocation due to serialization and IPC
- ⚠Debugging multi-server workflows requires tracing across process boundaries — standard debuggers insufficient
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
Repository Details
Last commit: Apr 21, 2026
About
Hands-on workshop: Build a multi-agent AI system from scratch — Deep Research Agent + Writing Workflow served as MCP servers. Includes code, slides, and video
Categories
Alternatives to designing-real-world-ai-agents-workshop
Are you the builder of designing-real-world-ai-agents-workshop?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →