claude vision api-optimized screenshot capture with automatic tiling
Captures full-page website screenshots and automatically tiles them into 1072x1072 pixel chunks (1.15 megapixels) using Sharp image processing, optimizing for Claude Vision API's token efficiency and visual processing constraints. The system constrains all viewport dimensions to maximum 1072x1072 to ensure each tile fits within optimal vision model input boundaries without requiring external image resizing or post-processing.
Unique: Implements automatic tiling specifically calibrated to Claude Vision API's 1.15 megapixel optimal input size, using Sharp for efficient image chunking rather than generic screenshot tools that require manual post-processing. The 1072x1072 constraint is baked into the viewport configuration itself, not applied after capture.
vs alternatives: Unlike Playwright or Puppeteer screenshot methods that capture at arbitrary resolutions requiring external tiling, this tool bakes Claude Vision optimization into the capture pipeline, eliminating post-processing overhead and ensuring consistent token efficiency.
configurable wait strategies for dynamic content stabilization
Implements multiple wait strategies (networkIdle, domContentLoaded, custom JavaScript conditions) to ensure dynamic content has fully loaded before capture, with configurable timeouts and retry logic. The system injects JavaScript probes to detect application-specific readiness conditions (e.g., React hydration, data fetch completion) rather than relying solely on browser network events.
Unique: Combines multiple wait strategies (networkIdle, domContentLoaded, custom JavaScript probes) with retry logic and timeout handling, allowing detection of application-specific readiness states via injected JavaScript rather than generic browser events. The architecture supports both framework-agnostic network-based waits and framework-aware custom conditions.
vs alternatives: More sophisticated than Puppeteer's default waitForNavigation (which only handles network events), this system allows custom JavaScript condition injection for framework-specific readiness detection, making it suitable for modern SPAs that don't follow traditional page load patterns.
sharp-based image processing and tiling pipeline
Uses the Sharp image processing library to efficiently tile full-page screenshots into 1072x1072 chunks, handling image format conversion, compression, and metadata extraction. The tiling pipeline processes captured PNG images through Sharp's streaming API, splitting large images into overlapping or non-overlapping tiles based on configuration, and returning tile metadata with coordinate information.
Unique: Leverages Sharp's high-performance image processing library for efficient tiling, using streaming APIs to minimize memory overhead. The tiling pipeline is optimized for the specific 1072x1072 constraint, avoiding generic image resizing or cropping overhead.
vs alternatives: More efficient than canvas-based tiling or ImageMagick, Sharp provides native Node.js bindings with streaming support, enabling fast tiling of large images without excessive memory consumption or process spawning.
headless browser lifecycle management with auto-restart and signal handling
Manages Chromium browser process lifecycle with automatic restart on crash, graceful shutdown on signals (SIGTERM, SIGINT), and connection pooling to reuse browser instances across multiple screenshot operations. The system implements a serve-restart wrapper that monitors the main MCP server process and automatically restarts it if it crashes, maintaining availability for long-running AI agent workflows.
Unique: Implements a two-tier process architecture (serve-restart wrapper + main MCP server) that monitors and auto-restarts the screenshot service on crash, combined with graceful signal handling for clean shutdown. This pattern is distinct from simple browser pooling — it ensures the entire service remains available even if the underlying browser process crashes.
vs alternatives: Unlike Puppeteer or Playwright used directly (which require manual crash handling), this tool wraps the entire screenshot service with automatic restart logic, making it suitable for production AI agent deployments where availability is critical.
screencast recording with adaptive frame rates and webp animation
Records time-series screenshots of page interactions as WebP animations with adaptive frame rate selection based on content change detection. The system captures PNG frames at configurable intervals, deduplicates identical frames to reduce file size, and encodes the sequence into WebP animations using Sharp, enabling efficient video-like capture of dynamic page behavior without full video codec overhead.
Unique: Combines adaptive frame rate capture with pixel-level deduplication and WebP animation encoding, allowing efficient time-series recording of page state changes. The system injects JavaScript to detect content changes and adjust frame capture intervals dynamically, reducing redundant frames while maintaining visual fidelity.
vs alternatives: More efficient than full video recording (no codec overhead) and more intelligent than fixed-interval frame capture (deduplication reduces file size by 30-50% for static content), making it ideal for AI vision analysis of page interactions without excessive token consumption.
javascript console message capture with execution context
Captures console output (log, error, warn, info) during page execution with full execution context, including message content, severity level, and timestamp. The system injects a JavaScript listener that intercepts console methods and collects messages over a specified duration, returning structured JSON with all captured messages for analysis by AI models.
Unique: Implements JavaScript injection-based console interception that captures all console method calls with structured metadata (level, timestamp, message), providing a machine-readable log of page execution behavior. This is distinct from browser DevTools protocol logging, which requires additional parsing.
vs alternatives: More accessible than raw CDP (Chrome DevTools Protocol) console logging, this approach provides structured JSON output directly suitable for AI analysis without requiring additional parsing or protocol handling.
mcp protocol integration with stdio json-rpc transport
Exposes screenshot and screencast capabilities as MCP tools via stdio-based JSON-RPC transport, enabling integration with Claude Code, VS Code, Cursor, and JetBrains IDEs. The system implements the Model Context Protocol specification, serializing tool requests/responses as JSON-RPC messages over stdin/stdout, allowing AI assistants to invoke screenshot operations as native tools.
Unique: Implements full Model Context Protocol compliance with stdio JSON-RPC transport, exposing screenshot operations as native MCP tools that Claude and other AI assistants can invoke directly. The architecture includes proper tool schema definition, error handling, and response serialization.
vs alternatives: Unlike REST API or direct library integration, MCP protocol integration allows Claude and other AI assistants to treat screenshot capture as a first-class tool with proper schema validation and error handling, enabling more reliable AI-driven web automation.
cli binary interface with direct command-line screenshot execution
Provides a command-line interface (bin/mcp-screenshot-website.js) for direct screenshot capture without MCP server overhead, enabling scripting, testing, and manual screenshot operations. The CLI accepts URL, viewport, wait strategy, and output format parameters, executing the screenshot capture engine directly and returning results as files or base64-encoded output.
Unique: Provides a lightweight CLI entry point that bypasses MCP server overhead for one-off screenshot operations, using the same underlying screenshot engine as the MCP server but with direct process invocation and file-based output.
vs alternatives: Simpler than running a full MCP server for single screenshot operations, this CLI approach is ideal for scripting and testing but trades concurrency and performance for simplicity.
+3 more capabilities