Browser MCP
MCP ServerFree** (by UI-TARS) - A fast, lightweight MCP server that empowers LLMs with browser automation via Puppeteer’s structured accessibility data, featuring optional vision mode for complex visual understanding and flexible, cross-platform configuration.
Capabilities12 decomposed
accessibility tree-based browser element targeting
Medium confidenceExtracts and structures DOM elements via Puppeteer's accessibility tree API, converting browser UI into a machine-readable format that LLMs can reason about without pixel-level analysis. This approach parses semantic HTML structure, ARIA attributes, and computed accessibility properties into a hierarchical JSON representation, enabling precise element identification and interaction planning without vision processing overhead.
Uses Puppeteer's native accessibility tree extraction rather than screenshot-based vision or regex DOM parsing, providing semantic-aware element identification that preserves ARIA relationships and computed accessibility properties in a structured format suitable for LLM reasoning
Faster and cheaper than vision-based browser agents (no VLM calls) while more reliable than regex/CSS selector approaches on dynamic or complex UIs, as it leverages browser-native accessibility APIs that understand semantic intent
optional vision-augmented element understanding
Medium confidenceIntegrates optional vision model processing (VLM) for scenarios where accessibility tree data is insufficient, allowing the MCP server to fall back to screenshot analysis for complex visual layouts, custom components, or visual-only interactions. The architecture supports pluggable VLM providers (OpenAI Vision, local models) that receive cropped element screenshots and accessibility context together, enabling hybrid reasoning that combines structural and visual understanding.
Implements vision as an optional augmentation layer rather than primary mechanism, combining accessibility tree data with VLM analysis to provide both structural and visual context, reducing unnecessary vision calls while maintaining fallback capability for complex UIs
More efficient than pure vision-based agents (uses accessibility tree first) while more capable than text-only agents on visual UIs; supports multiple VLM providers rather than being locked to a single vision API
cookie and storage management across sessions
Medium confidenceManages browser cookies, localStorage, sessionStorage, and IndexedDB across automation sessions, enabling state persistence across page navigations and session resumption. The implementation provides APIs to read, write, and clear storage, supporting cookie serialization for session export/import, enabling multi-step workflows that require maintaining authentication state or user preferences across multiple pages.
Provides unified storage management API covering cookies, localStorage, and sessionStorage with serialization support for session export/import, enabling checkpoint-based workflow resumption and multi-session state persistence beyond simple cookie handling
More comprehensive than basic cookie management; supports multiple storage types; enables session export/import for resilience vs stateless automation approaches
configurable mcp server deployment and transport
Medium confidenceDeploys the Browser MCP server with flexible transport options (stdio, HTTP, SSE) and configuration management, supporting both local and remote deployment scenarios. The architecture uses environment variables and configuration files for flexible setup, enabling deployment as a standalone service, embedded in larger agent systems, or as a Docker container, with support for multiple concurrent client connections and graceful shutdown.
Implements flexible MCP server deployment with multiple transport options and environment-based configuration, enabling both embedded and standalone deployment scenarios without code changes, supporting Docker containerization and remote deployment patterns
More flexible deployment than single-transport MCP servers; supports both local and remote scenarios; configuration-driven approach enables environment-specific setup without code modification
mcp-compliant tool schema registration and function calling
Medium confidenceImplements the Model Context Protocol (MCP) server specification, exposing browser automation capabilities as standardized MCP tools with JSON schema definitions. The server registers tools like 'click', 'type', 'navigate', 'extract_text' with formal input/output schemas, allowing any MCP-compatible LLM client to discover, validate, and invoke browser actions through the standard MCP tool-calling interface without custom integration code.
Implements full MCP server specification for browser tools, providing schema-validated tool discovery and invocation rather than custom API endpoints, enabling seamless integration with any MCP-aware LLM client without protocol translation
Standards-based approach vs proprietary APIs; enables tool reuse across multiple LLM platforms (Claude, GPT, local models) without reimplementation, and provides automatic schema validation that REST APIs require custom middleware for
cross-platform browser session management via puppeteer
Medium confidenceManages browser lifecycle and session state through Puppeteer's high-level API, handling browser launch, page creation, context isolation, and graceful shutdown across Windows, macOS, and Linux. The architecture maintains a pool of browser contexts with independent cookies, storage, and network interception, allowing multiple concurrent automation sessions with isolated state while reusing a single browser process for efficiency.
Leverages Puppeteer's context API for true session isolation rather than simple page management, enabling concurrent multi-session automation with independent cookies/storage while maintaining a single browser process for resource efficiency
More efficient than spawning separate browser processes per session; provides better isolation than shared-page approaches; cross-platform without custom OS-specific code unlike Selenium or raw browser APIs
structured dom extraction and content parsing
Medium confidenceExtracts and parses page content into structured formats (JSON, markdown, plain text) by traversing the DOM and accessibility tree, capturing text content, form fields, links, and metadata while preserving semantic relationships. The parser handles nested structures, tables, lists, and form hierarchies, outputting clean structured data suitable for LLM analysis without requiring vision processing or manual HTML parsing.
Combines accessibility tree parsing with DOM traversal to extract both semantic structure and content, preserving form relationships and element hierarchy rather than flattening to plain text, enabling LLMs to reason about page organization
Preserves semantic structure better than regex/string parsing; faster than vision-based extraction; more reliable than CSS selector-based approaches on dynamic content
interactive element action execution (click, type, scroll, submit)
Medium confidenceExecutes user-like interactions on page elements through Puppeteer's high-level action APIs, including clicking, typing text, scrolling, form submission, and keyboard navigation. The implementation handles element visibility verification, scroll-into-view automation, focus management, and retry logic for flaky interactions, ensuring reliable action execution even on dynamically-rendered or partially-visible elements.
Implements robust action execution with automatic visibility verification, scroll-into-view, and retry logic rather than naive element interaction, handling edge cases like overlays, dynamic rendering, and flaky network conditions that raw Puppeteer APIs don't address
More reliable than basic Puppeteer click/type due to built-in visibility checks and retry logic; more human-like than direct DOM manipulation; handles dynamic content better than static selector-based approaches
page navigation and wait strategy orchestration
Medium confidenceManages page navigation with configurable wait strategies (waitForNavigation, waitForSelector, waitForFunction, waitForTimeout) to handle dynamic content loading, SPA routing, and asynchronous rendering. The implementation chains wait conditions intelligently, detecting when navigation is complete vs when content is still loading, and provides timeout management to prevent indefinite hangs on slow or broken pages.
Implements multi-condition wait orchestration combining network idle detection, DOM readiness, and custom selectors rather than single-condition waits, enabling reliable automation of complex SPAs and async-heavy sites where traditional navigation events are unreliable
More sophisticated than basic waitForNavigation; handles SPAs better than traditional Selenium waits; provides configurable strategies vs hardcoded timeouts in simpler automation tools
network request interception and response mocking
Medium confidenceIntercepts and modifies HTTP requests/responses at the network layer using Puppeteer's request interception API, enabling request blocking, response mocking, header injection, and request modification. This capability allows automation to bypass external dependencies, mock API responses, inject authentication headers, or block tracking scripts without modifying page code, useful for testing, performance optimization, and handling external service failures.
Provides network-layer request/response manipulation through Puppeteer's interception API rather than application-level mocking, enabling transparent request modification without page code changes and supporting complex scenarios like header injection and request blocking
More transparent than application-level mocking; enables testing without modifying target code; more comprehensive than simple request blocking available in basic automation tools
screenshot capture and visual state recording
Medium confidenceCaptures full-page or element-specific screenshots in multiple formats (PNG, JPEG) with configurable quality, scaling, and viewport settings. The implementation supports full-page scrolling screenshots, element bounding box capture, and viewport-relative screenshots, enabling visual state recording for debugging, verification, or vision model input without requiring external screenshot tools.
Integrates screenshot capture as a native MCP tool with configurable formats and element-specific clipping, enabling vision models to receive targeted visual input rather than full-page screenshots, reducing token consumption and improving analysis focus
Native integration vs external screenshot tools; supports element-specific clipping for vision model efficiency; full-page capture capability beyond viewport limitations of basic screenshot tools
javascript execution and page state evaluation
Medium confidenceExecutes arbitrary JavaScript in the page context using Puppeteer's evaluateOnNewDocument and evaluate APIs, enabling custom logic execution, state inspection, and DOM manipulation. The implementation handles serialization of return values, error propagation, and context isolation, allowing automation to run custom scripts for complex state queries, form validation, or page-specific logic without relying on accessibility tree or DOM selectors.
Exposes Puppeteer's evaluate API as an MCP tool, allowing LLM agents to execute arbitrary JavaScript for state inspection and custom logic without requiring pre-built selectors or accessibility tree parsing, enabling adaptation to novel page structures
More flexible than selector-based approaches for complex state queries; enables custom logic execution without modifying page code; more powerful than static DOM parsing for dynamic or computed values
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with Browser MCP, ranked by overlap. Discovered automatically through the match graph.
Playwright MCP Server
Automate browsers and run web tests via Playwright MCP.
mobile-mcp
Model Context Protocol Server for Mobile Automation and Scraping (iOS, Android, Emulators, Simulators and Real Devices)
Peekaboo
** - a macOS-only MCP server that enables AI agents to capture screenshots of applications, or the entire system.
Notte
Notte is the fastest, most reliable Browser Using Agents framework
chrome-devtools-mcp
Chrome DevTools for coding agents
playwright-mcp
Playwright MCP server
Best For
- ✓LLM agent builders automating web applications with deterministic UI structures
- ✓Teams building accessibility-first automation where semantic HTML is reliable
- ✓Developers optimizing for speed over visual complexity handling
- ✓Teams automating legacy web applications with minimal semantic markup
- ✓Builders handling visual design-heavy interfaces (design tools, image editors, custom dashboards)
- ✓Projects where accuracy on complex UIs justifies the latency/cost of vision processing
- ✓LLM agents automating multi-step workflows requiring authentication
- ✓Teams testing applications with complex session management
Known Limitations
- ⚠Cannot handle visual-only UI elements (canvas, SVG graphics, custom-drawn components) without vision fallback
- ⚠Accessibility tree may be incomplete or malformed on poorly-structured websites
- ⚠Dynamic content loaded after initial page render may not be captured without explicit wait strategies
- ⚠Requires well-formed HTML with proper ARIA labels for optimal element identification
- ⚠Vision processing adds 500ms-2s latency per element analysis depending on VLM provider
- ⚠Requires API credentials for external VLM providers or local model infrastructure
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
About
** (by UI-TARS) - A fast, lightweight MCP server that empowers LLMs with browser automation via Puppeteer’s structured accessibility data, featuring optional vision mode for complex visual understanding and flexible, cross-platform configuration.
Categories
Alternatives to Browser MCP
Are you the builder of Browser MCP?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →