video transcript extraction with platform-specific parsing
Extracts full transcripts from YouTube, TikTok, Instagram, and Twitter videos by integrating with the Supadata API, which handles platform-specific authentication, caption retrieval, and text normalization. The MCP server wraps this via the supadata_transcript tool, routing requests through either stdio (local) or Cloudflare Workers (edge) transport layers, with built-in exponential backoff retry logic for rate-limited responses (429 errors).
Unique: Directly integrates Supadata's proprietary multi-platform video parsing (YouTube, TikTok, Instagram, Twitter) into MCP protocol, avoiding the need for separate platform-specific SDKs or scraping logic. Supports both local stdio and edge deployment via Cloudflare Workers with unified OAuth 2.0 authentication.
vs alternatives: Handles multiple video platforms (YouTube, TikTok, Instagram, Twitter) in a single tool without requiring separate API keys per platform, unlike building individual integrations with each platform's API.
video metadata and structured extraction with ai enrichment
Retrieves metadata (title, duration, channel info, upload date) and performs AI-powered structured data extraction from video content via supadata_metadata and supadata_extract tools. The extraction uses the Supadata API's LLM-based parsing to convert unstructured video content into schema-compliant JSON, with configurable output schemas passed as tool parameters.
Unique: Combines metadata retrieval with LLM-powered schema-based extraction in a single tool, allowing developers to define custom output schemas and have the Supadata API intelligently map video content to those schemas without writing custom parsing logic.
vs alternatives: Avoids the need to build separate metadata scrapers and custom LLM prompts for extraction — the Supadata API handles both in a unified, schema-aware manner with built-in retry logic.
github actions ci/cd pipeline with automated testing and deployment
Includes GitHub Actions workflows that automate testing, building, and deployment of the Supadata MCP server. The workflows run the test suite (src/index.test.ts), build Docker images, and deploy to container registries or cloud platforms. This enables continuous integration and deployment without manual intervention.
Unique: Provides ready-to-use GitHub Actions workflows that automate testing, building, and deployment of the Supadata MCP server, eliminating the need to write custom CI/CD pipelines. Workflows are integrated with the test suite and Docker build process.
vs alternatives: Avoids the need to set up custom CI/CD pipelines — the provided GitHub Actions workflows handle testing, building, and deployment automatically on every commit.
smithery mcp registry integration for tool discovery
Integrates with the Smithery MCP registry, allowing the Supadata MCP server to be discovered and installed via the Smithery package manager. This enables developers to install Supadata tools via a single command without manually cloning the repository or managing dependencies.
Unique: Registers the Supadata MCP server with the Smithery MCP registry, enabling one-command installation via a centralized package manager. Developers can discover and install Supadata tools without manual setup.
vs alternatives: Simpler than manual installation or cloning the repository — Smithery provides a centralized registry for MCP server discovery and installation.
single-page web scraping with markdown normalization
Scrapes a single web page and returns content as normalized Markdown via the supadata_scrape tool. The tool handles HTML parsing, content extraction, and Markdown conversion server-side, returning clean, LLM-friendly text without requiring client-side DOM manipulation or HTML parsing libraries. Integrates with the Supadata API's web scraping engine, which abstracts away JavaScript rendering and dynamic content challenges.
Unique: Returns Markdown-normalized output optimized for LLM consumption, abstracting away HTML parsing and JavaScript rendering complexity. The server-side processing means clients don't need Puppeteer, Cheerio, or other scraping libraries — just pass a URL.
vs alternatives: Simpler than building custom Puppeteer/Cheerio scrapers and returns LLM-friendly Markdown instead of raw HTML, reducing downstream parsing work in agent pipelines.
site-wide url discovery and mapping
Discovers all URLs on a website via the supadata_map tool, which crawls the site's structure and returns a list of discoverable URLs. This tool is designed for reconnaissance before batch crawling, allowing developers to understand site topology without fetching full page content. Uses the Supadata API's crawler to follow internal links and build a URL map, respecting robots.txt and site structure.
Unique: Provides URL discovery as a separate tool from content scraping, allowing developers to decouple site reconnaissance from data extraction. This enables smarter crawling strategies where agents can decide which URLs to fetch based on the map.
vs alternatives: Avoids the need to build custom site crawlers or use generic web crawlers — the Supadata API handles site structure discovery with built-in respect for robots.txt and site conventions.
asynchronous batch web crawling with job polling
Crawls multiple URLs asynchronously via the supadata_crawl tool, which queues a batch job and returns a job ID. Developers then poll the job status using supadata_check_*_status tools with exponential backoff retry logic. The server manages the async job lifecycle, storing results server-side and returning them when complete. This pattern decouples request submission from result retrieval, enabling high-volume crawling without blocking.
Unique: Implements job-based async crawling with built-in polling infrastructure (supadata_check_*_status tools), allowing agents to submit large crawls and check progress without blocking. The server manages job lifecycle and result storage, abstracting away distributed task complexity.
vs alternatives: Simpler than building custom job queues or using external task runners — the MCP server handles job submission, polling, and result retrieval with exponential backoff built-in.
job status polling with exponential backoff retry
Provides supadata_check_*_status tools that poll the status of asynchronous jobs (transcripts, crawls, extractions) with configurable exponential backoff retry logic. The server implements SUPADATA_RETRY_MAX_ATTEMPTS and SUPADATA_RETRY_INITIAL_DELAY configuration variables to control retry behavior, automatically handling transient failures and rate limits (429 errors) without requiring client-side retry logic.
Unique: Centralizes retry logic and exponential backoff in the MCP server itself, configured via environment variables (SUPADATA_RETRY_MAX_ATTEMPTS, SUPADATA_RETRY_INITIAL_DELAY), so clients don't need to implement their own retry loops. Handles 429 rate-limit errors transparently.
vs alternatives: Eliminates the need for client-side retry logic — the server handles backoff and transient failures automatically, reducing boilerplate in agent code.
+4 more capabilities