robin
RepositoryFreeAI-Powered Dark Web OSINT Tool
Capabilities11 decomposed
llm-powered query refinement for dark web search optimization
Medium confidenceTransforms raw user investigation queries into optimized search terms by routing them through a pluggable multi-provider LLM layer (OpenAI, Anthropic, Google, Ollama). The system uses prompt engineering to expand queries with domain-specific dark web terminology, synonyms, and alternative phrasings that improve hit rates across heterogeneous dark web search engines. Implementation delegates to llm.refine_query() which constructs a system prompt contextualizing the dark web domain, then streams the LLM response to generate semantically richer search queries.
Integrates domain-specific prompt engineering for dark web terminology expansion rather than generic query expansion; supports four LLM providers via unified abstraction layer (llm_utils.get_llm()) enabling provider switching without code changes, and contextualizes refinement within OSINT investigation workflows rather than generic search
Outperforms generic query expansion tools (e.g., Elasticsearch query DSL) by leveraging LLM semantic understanding of dark web marketplace conventions, payment tracking terminology, and threat actor naming patterns specific to OSINT investigations
multi-engine concurrent dark web search with result aggregation
Medium confidenceQueries multiple dark web search engines (Torch, Ahmia, Candle, etc.) concurrently using a thread-pooled orchestration pattern implemented in search.py:get_search_results(). Each search engine query is wrapped in a timeout-protected thread to prevent hanging on slow .onion sites; results are aggregated into a unified list of URLs and titles. The system handles search engine-specific response formats through adapter patterns, normalizing heterogeneous HTML/JSON responses into a common data structure for downstream LLM filtering.
Implements thread-pooled concurrent search across heterogeneous dark web search engines with timeout protection and adapter-based response normalization, rather than sequential queries or single-engine reliance; integrates Tor SOCKS5 proxy routing at the HTTP client level to ensure anonymity across all search engine queries
Faster than sequential dark web search tools by parallelizing queries across 4+ engines simultaneously; more comprehensive than single-engine tools (e.g., Torch-only searches) by aggregating results across multiple indices with different indexing patterns and coverage
configuration management via environment variables and config files
Medium confidenceManages Robin configuration through a two-tier system: environment variables for sensitive credentials (API keys, Tor proxy address) and YAML/JSON config files for operational settings (model selection, timeout values, search engine whitelist). The system reads environment variables first (highest priority), then falls back to config file values, then uses hardcoded defaults. Configuration is loaded at startup in main.py and passed through the investigation pipeline. This approach enables secure credential management (via environment variables in Docker/Kubernetes) while allowing flexible operational configuration (via config files for different investigation types).
Implements two-tier configuration (environment variables + config files) with environment variable priority, enabling secure credential management while allowing flexible operational configuration; supports multiple config file formats (YAML, JSON) for flexibility
More secure than hardcoded credentials by using environment variables; more flexible than single-tier configuration by supporting both sensitive (credentials) and operational (parameters) settings; more portable than system-specific config locations by supporting multiple formats
llm-based intelligent result filtering with relevance scoring
Medium confidenceFilters dark web search results using LLM-powered relevance scoring implemented in llm.py:filter_results(). The system constructs a prompt containing the original investigation query and candidate search results, then uses the LLM to score each result's relevance to the investigation objective. Results are ranked by LLM-assigned relevance scores and filtered to retain only high-confidence matches, reducing noise from off-topic .onion pages. This approach captures semantic relevance beyond keyword matching — e.g., identifying a marketplace listing as relevant to 'ransomware payment tracking' even if it doesn't contain the exact phrase.
Uses LLM semantic understanding to score relevance rather than keyword matching or TF-IDF, enabling detection of conceptually related pages that don't contain exact query terms; integrates with the multi-provider LLM abstraction to allow filtering with different models and comparing their scoring patterns
More semantically accurate than regex/keyword-based filtering (e.g., grep-based result filtering) because it understands synonyms and contextual relevance; faster than manual review but slower than simple keyword filtering, trading latency for recall/precision improvements
tor-routed anonymous content scraping from .onion sites
Medium confidenceExtracts HTML content from dark web .onion sites by routing HTTP requests through a Tor SOCKS5 proxy (127.0.0.1:9050) implemented in scrape.py:scrape_multiple(). The system uses a thread-pooled architecture to scrape multiple URLs concurrently with per-request timeout protection (default 30 seconds) to prevent hanging on slow/offline sites. Responses are parsed with BeautifulSoup to extract text content, and failures (connection timeouts, 404s, Tor circuit failures) are gracefully handled with fallback retry logic. The implementation maintains request anonymity by routing all HTTP traffic through Tor and rotating user agents to avoid fingerprinting.
Implements thread-pooled concurrent scraping with per-request timeout protection and Tor SOCKS5 proxy routing at the HTTP client level, ensuring anonymity across all requests; integrates graceful failure handling with retry logic rather than blocking on slow/offline sites, enabling large-scale scraping without manual intervention
Faster than sequential scraping by parallelizing requests across 5-10 threads; more reliable than naive Tor scraping by implementing timeout protection and retry logic; more anonymous than direct HTTP scraping by routing all traffic through Tor and rotating user agents
structured osint report generation from raw dark web content
Medium confidenceSynthesizes raw scraped content, search results, and metadata into structured intelligence reports using LLM-powered summarization implemented in llm.py:generate_summary(). The system constructs a prompt containing the investigation query, filtered search results, and scraped page content, then uses the LLM to extract key findings, identify threat indicators (IOCs), and organize information into a structured report with sections like 'Threat Overview', 'Key Findings', 'Indicators of Compromise', and 'Recommendations'. The report is formatted as JSON or markdown for downstream consumption by SIEM systems, threat intelligence platforms, or human analysts.
Implements LLM-powered synthesis of heterogeneous dark web content (marketplace listings, forum posts, leaked data) into structured OSINT reports with explicit IOC extraction, rather than simple text summarization; integrates with the multi-provider LLM abstraction to allow report generation with different models and comparing output quality
More actionable than generic summarization tools because it extracts structured IOCs and threat indicators; faster than manual report writing by automating synthesis of 20+ pages into a structured format; more flexible than template-based reporting by using LLM to adapt report structure to investigation context
multi-provider llm abstraction with unified interface
Medium confidenceProvides a pluggable abstraction layer for multiple LLM providers (OpenAI, Anthropic, Google, Ollama) implemented in llm_utils.py:get_llm(). The system uses a factory pattern to instantiate the appropriate LLM client based on environment variables or configuration, enabling seamless provider switching without modifying downstream code. Each provider is wrapped with a consistent interface supporting streaming responses, token counting, and error handling. Configuration is managed through environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.) and a config file, allowing users to specify model selection, temperature, and max tokens per provider.
Implements a unified factory pattern abstraction across four distinct LLM providers (OpenAI, Anthropic, Google, Ollama) with consistent interface for streaming, error handling, and configuration, rather than provider-specific client code scattered throughout the codebase; enables on-premises execution via Ollama while maintaining API compatibility with cloud providers
More flexible than provider-locked tools (e.g., OpenAI-only OSINT tools) by supporting multiple providers; more maintainable than conditional provider logic throughout codebase by centralizing provider instantiation; enables cost optimization by allowing provider switching based on query complexity
six-stage investigation pipeline orchestration
Medium confidenceOrchestrates a complete dark web OSINT investigation workflow through a six-stage pipeline implemented in main.py:cli(). The pipeline sequentially executes: (1) LLM initialization, (2) query refinement, (3) multi-engine search, (4) result filtering, (5) content scraping, and (6) report generation. Each stage is implemented as a modular function with clear input/output contracts, enabling easy insertion of custom stages or modification of existing ones. The orchestration layer handles error propagation, logging, and progress reporting across stages, with optional checkpointing to resume interrupted investigations.
Implements a six-stage investigation pipeline with clear modular boundaries and unified orchestration in main.py, enabling easy extension and customization; integrates all Robin capabilities (query refinement, search, filtering, scraping, synthesis) into a cohesive workflow rather than exposing individual functions
More comprehensive than single-purpose tools (e.g., search-only or scrape-only tools) by automating the entire investigation workflow; more maintainable than monolithic scripts by decomposing the pipeline into modular stages with clear contracts
dual-mode interface: cli and streamlit web ui
Medium confidenceExposes the investigation pipeline through two distinct interfaces: command-line (main.py:cli()) for automation and scripting, and Streamlit web UI (ui.py) for interactive exploration. The CLI mode accepts query and model arguments, executes the investigation pipeline, and outputs results to stdout/files. The Streamlit UI provides a web dashboard with form inputs for query/model selection, real-time progress updates, and interactive result visualization. Both interfaces share the same underlying pipeline implementation, ensuring consistency while accommodating different user workflows (batch automation vs. interactive investigation).
Provides dual-mode interface (CLI + Streamlit web UI) with shared underlying pipeline implementation, enabling both automation and interactive workflows from a single codebase; Streamlit UI offers real-time progress updates and interactive result visualization rather than static output
More accessible than CLI-only tools by providing a web UI for non-technical users; more flexible than web-only tools by supporting command-line automation and scripting; maintains consistency across interfaces by sharing the same pipeline implementation
tor socks5 proxy integration for anonymous network access
Medium confidenceRoutes all HTTP traffic (search queries, content scraping) through a Tor SOCKS5 proxy running on 127.0.0.1:9050, implemented at the HTTP client level using the requests library with PySocks support. The system configures the Tor proxy globally for all outbound requests, ensuring that search engine queries and .onion site scraping are anonymized. Tor circuit failures are handled with retry logic, and user agents are rotated to avoid fingerprinting. The implementation assumes a local Tor service is running (typically via Docker or system package) and does not manage Tor lifecycle.
Integrates Tor SOCKS5 proxy routing at the HTTP client level (requests library with PySocks) rather than system-level proxy configuration, enabling fine-grained control over which requests are routed through Tor; implements user agent rotation and retry logic to improve reliability on Tor network
More reliable than system-level Tor proxy configuration by handling Tor-specific failures (circuit failures, exit node blocking) with retry logic; more flexible than VPN-based anonymity by enabling per-request circuit rotation and exit node selection
docker containerization with tor service bundling
Medium confidencePackages Robin and its dependencies (Python, requests, BeautifulSoup, Streamlit) into a Docker image with an integrated Tor service, enabling single-command deployment without manual dependency installation. The Dockerfile installs Tor, configures SOCKS5 proxy on 127.0.0.1:9050, and starts both Tor and Robin services on container startup. Environment variables for LLM provider credentials are passed at runtime, allowing users to deploy the container without modifying the image. The Docker Compose configuration (if provided) orchestrates the Robin container with optional additional services (e.g., Redis for caching, PostgreSQL for result storage).
Bundles Tor service directly into Docker image rather than requiring external Tor service, simplifying deployment; uses environment variable injection for LLM credentials, enabling credential management without image rebuilds
Simpler deployment than manual installation by bundling all dependencies; more portable than system-specific packages by using Docker; enables cloud deployment without infrastructure setup
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with robin, ranked by overlap. Discovered automatically through the match graph.
Web Search MCP
** - A server that provides local, full web search, summaries and page extration for use with Local LLMs.
firecrawl-mcp-server
🔥 Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.
Tavily MCP Server
AI-optimized web search and content extraction via Tavily MCP.
SearXNG
Privacy-respecting metasearch — 70+ engines, no tracking, self-hosted, JSON API for AI agents.
duckduckgo-mcp-server
A Model Context Protocol (MCP) server that provides web search capabilities through DuckDuckGo, with additional features for content fetching and parsing.
local-deep-research
Local Deep Research achieves ~95% on SimpleQA benchmark (tested with GPT-4.1-mini). Supports local and cloud LLMs (Ollama, Google, Anthropic, ...). Searches 10+ sources - arXiv, PubMed, web, and your private documents. Everything Local & Encrypted.
Best For
- ✓threat intelligence analysts automating repetitive query expansion
- ✓law enforcement investigators scaling dark web searches across multiple jurisdictions
- ✓security researchers tracking credential exposure campaigns with evolving naming conventions
- ✓OSINT investigators needing comprehensive dark web coverage without manual multi-engine queries
- ✓threat intelligence teams automating large-scale dark web monitoring across multiple search indices
- ✓security researchers comparing search engine indexing patterns across the dark web
- ✓DevOps teams deploying Robin to multiple environments with different configurations
- ✓organizations with security requirements for credential management (no hardcoded keys)
Known Limitations
- ⚠LLM-based refinement adds 2-5 second latency per query due to API round-trip
- ⚠Query expansion quality depends on LLM model capability; smaller models (Ollama 7B) produce less sophisticated synonyms than GPT-4
- ⚠No caching of refined queries — identical raw queries trigger redundant LLM calls if not deduplicated upstream
- ⚠Search engine availability is unpredictable — individual .onion search engines may be offline or rate-limited, causing partial result loss
- ⚠Concurrent requests to multiple search engines increase Tor exit node load and may trigger rate-limiting or IP bans
- ⚠Result normalization is lossy — search engine-specific metadata (relevance scores, date indexed) is discarded during aggregation
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
Repository Details
Last commit: Mar 31, 2026
About
AI-Powered Dark Web OSINT Tool
Categories
Alternatives to robin
Are you the builder of robin?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →