llm-powered query refinement for dark web search optimization, multi-engine concurrent dark web search with result aggregation, configuration management via environment variables and config files, llm-based intelligent result filtering with relevance scoring, tor-routed anonymous content scraping from .onion sites, structured osint report generation from raw dark web content, multi-provider llm abstraction with unified interface, six-stage investigation pipeline orchestration, dual-mode interface: cli and streamlit web ui, tor socks5 proxy integration for anonymous network access, docker containerization with tor service bundling

robin

RepositoryFree

AI-Powered Dark Web OSINT Tool

Open Source

/ 100

11 capabilities

Capabilities11 decomposed

llm-powered query refinement for dark web search optimization

Medium confidence

Transforms raw user investigation queries into optimized search terms by routing them through a pluggable multi-provider LLM layer (OpenAI, Anthropic, Google, Ollama). The system uses prompt engineering to expand queries with domain-specific dark web terminology, synonyms, and alternative phrasings that improve hit rates across heterogeneous dark web search engines. Implementation delegates to llm.refine_query() which constructs a system prompt contextualizing the dark web domain, then streams the LLM response to generate semantically richer search queries.

Solves for

I want to automatically expand vague threat intelligence queries into multiple search variations without manual keyword researchI need to translate law enforcement terminology into dark web marketplace vernacular to find relevant listingsI want to generate alternative phrasings of a ransomware group name to catch misspellings and aliases across search engines

Best for

threat intelligence analysts automating repetitive query expansion

law enforcement investigators scaling dark web searches across multiple jurisdictions

security researchers tracking credential exposure campaigns with evolving naming conventions

Requires

Python 3.8+

API credentials for at least one LLM provider (OpenAI API key, Anthropic API key, Google API key, or local Ollama instance on localhost:11434)

llm.py module with refine_query() function

Limitations

LLM-based refinement adds 2-5 second latency per query due to API round-trip

Query expansion quality depends on LLM model capability; smaller models (Ollama 7B) produce less sophisticated synonyms than GPT-4

No caching of refined queries — identical raw queries trigger redundant LLM calls if not deduplicated upstream

What makes it unique

Integrates domain-specific prompt engineering for dark web terminology expansion rather than generic query expansion; supports four LLM providers via unified abstraction layer (llm_utils.get_llm()) enabling provider switching without code changes, and contextualizes refinement within OSINT investigation workflows rather than generic search

vs alternatives

Outperforms generic query expansion tools (e.g., Elasticsearch query DSL) by leveraging LLM semantic understanding of dark web marketplace conventions, payment tracking terminology, and threat actor naming patterns specific to OSINT investigations

multi-engine concurrent dark web search with result aggregation

Medium confidence

Queries multiple dark web search engines (Torch, Ahmia, Candle, etc.) concurrently using a thread-pooled orchestration pattern implemented in search.py:get_search_results(). Each search engine query is wrapped in a timeout-protected thread to prevent hanging on slow .onion sites; results are aggregated into a unified list of URLs and titles. The system handles search engine-specific response formats through adapter patterns, normalizing heterogeneous HTML/JSON responses into a common data structure for downstream LLM filtering.

Solves for

I want to query all major dark web search engines in parallel to maximize coverage without sequential latencyI need to deduplicate results across search engines to avoid processing the same .onion page multiple timesI want to rank search results by frequency across engines to identify the most indexed/relevant pages

Best for

OSINT investigators needing comprehensive dark web coverage without manual multi-engine queries

threat intelligence teams automating large-scale dark web monitoring across multiple search indices

security researchers comparing search engine indexing patterns across the dark web

Requires

Python 3.8+

Tor service running on 127.0.0.1:9050 (SOCKS5 proxy)

search.py module with get_search_results() function

Limitations

Search engine availability is unpredictable — individual .onion search engines may be offline or rate-limited, causing partial result loss

Concurrent requests to multiple search engines increase Tor exit node load and may trigger rate-limiting or IP bans

Result normalization is lossy — search engine-specific metadata (relevance scores, date indexed) is discarded during aggregation

What makes it unique

Implements thread-pooled concurrent search across heterogeneous dark web search engines with timeout protection and adapter-based response normalization, rather than sequential queries or single-engine reliance; integrates Tor SOCKS5 proxy routing at the HTTP client level to ensure anonymity across all search engine queries

vs alternatives

Faster than sequential dark web search tools by parallelizing queries across 4+ engines simultaneously; more comprehensive than single-engine tools (e.g., Torch-only searches) by aggregating results across multiple indices with different indexing patterns and coverage

configuration management via environment variables and config files

Medium confidence

Manages Robin configuration through a two-tier system: environment variables for sensitive credentials (API keys, Tor proxy address) and YAML/JSON config files for operational settings (model selection, timeout values, search engine whitelist). The system reads environment variables first (highest priority), then falls back to config file values, then uses hardcoded defaults. Configuration is loaded at startup in main.py and passed through the investigation pipeline. This approach enables secure credential management (via environment variables in Docker/Kubernetes) while allowing flexible operational configuration (via config files for different investigation types).

Solves for

I want to configure Robin with API keys without hardcoding them in the source codeI need to adjust investigation parameters (timeout, concurrency, model) without modifying codeI want to deploy Robin to different environments (dev, staging, prod) with different configurations

Best for

DevOps teams deploying Robin to multiple environments with different configurations

organizations with security requirements for credential management (no hardcoded keys)

developers customizing Robin behavior without modifying source code

Requires

Python 3.8+

Environment variables set in shell, Docker, or Kubernetes

Optional config file (YAML or JSON) in working directory or specified path

Limitations

No built-in validation of configuration values — invalid settings may cause runtime errors

Configuration is loaded once at startup — changes require container restart

No built-in secrets management — credentials are stored as plaintext environment variables (requires external secrets manager)

What makes it unique

Implements two-tier configuration (environment variables + config files) with environment variable priority, enabling secure credential management while allowing flexible operational configuration; supports multiple config file formats (YAML, JSON) for flexibility

vs alternatives

More secure than hardcoded credentials by using environment variables; more flexible than single-tier configuration by supporting both sensitive (credentials) and operational (parameters) settings; more portable than system-specific config locations by supporting multiple formats

llm-based intelligent result filtering with relevance scoring

Medium confidence

Filters dark web search results using LLM-powered relevance scoring implemented in llm.py:filter_results(). The system constructs a prompt containing the original investigation query and candidate search results, then uses the LLM to score each result's relevance to the investigation objective. Results are ranked by LLM-assigned relevance scores and filtered to retain only high-confidence matches, reducing noise from off-topic .onion pages. This approach captures semantic relevance beyond keyword matching — e.g., identifying a marketplace listing as relevant to 'ransomware payment tracking' even if it doesn't contain the exact phrase.

Solves for

I want to automatically filter out irrelevant dark web search results without manual review of hundreds of pagesI need to identify which search results are most likely to contain actionable threat intelligence for my investigationI want to prioritize scraping resources by focusing on high-relevance pages first, given limited bandwidth/time

Best for

threat intelligence analysts processing large result sets (100+ pages) from dark web searches

automated OSINT pipelines that need to reduce downstream scraping load by filtering early

investigators with limited time who need to focus on the most relevant 10-20% of results

Requires

Python 3.8+

LLM provider credentials (OpenAI, Anthropic, Google, or local Ollama)

llm.py module with filter_results() function

Limitations

LLM filtering adds 3-8 second latency per batch of results due to API round-trip and token processing

Relevance scoring is subjective and model-dependent — GPT-4 may score differently than Claude or Ollama 7B

Batch size is limited by LLM context window (4K-8K tokens) — cannot efficiently filter 500+ results in a single call

What makes it unique

Uses LLM semantic understanding to score relevance rather than keyword matching or TF-IDF, enabling detection of conceptually related pages that don't contain exact query terms; integrates with the multi-provider LLM abstraction to allow filtering with different models and comparing their scoring patterns

vs alternatives

More semantically accurate than regex/keyword-based filtering (e.g., grep-based result filtering) because it understands synonyms and contextual relevance; faster than manual review but slower than simple keyword filtering, trading latency for recall/precision improvements

tor-routed anonymous content scraping from .onion sites

Medium confidence

Extracts HTML content from dark web .onion sites by routing HTTP requests through a Tor SOCKS5 proxy (127.0.0.1:9050) implemented in scrape.py:scrape_multiple(). The system uses a thread-pooled architecture to scrape multiple URLs concurrently with per-request timeout protection (default 30 seconds) to prevent hanging on slow/offline sites. Responses are parsed with BeautifulSoup to extract text content, and failures (connection timeouts, 404s, Tor circuit failures) are gracefully handled with fallback retry logic. The implementation maintains request anonymity by routing all HTTP traffic through Tor and rotating user agents to avoid fingerprinting.

Solves for

I want to extract full page content from .onion marketplace listings without exposing my IP addressI need to scrape 50+ dark web pages in parallel to accelerate threat intelligence gatheringI want to handle scraping failures gracefully — if one .onion site is down, continue scraping others without blocking

Best for

OSINT investigators extracting marketplace listings, forum posts, and leaked data from .onion sites

threat intelligence teams automating large-scale dark web content harvesting for analysis

security researchers studying dark web marketplace dynamics and pricing patterns

Requires

Python 3.8+

Tor service running on 127.0.0.1:9050 (SOCKS5 proxy)

requests library with PySocks for SOCKS5 support

Limitations

Tor routing adds 2-10 second latency per request due to circuit establishment and exit node selection

Concurrent scraping may trigger rate-limiting or IP bans from .onion sites if too aggressive (recommend max 5-10 concurrent threads)

JavaScript-heavy .onion sites cannot be scraped — only static HTML is extracted; dynamic content requires Selenium/Playwright

What makes it unique

Implements thread-pooled concurrent scraping with per-request timeout protection and Tor SOCKS5 proxy routing at the HTTP client level, ensuring anonymity across all requests; integrates graceful failure handling with retry logic rather than blocking on slow/offline sites, enabling large-scale scraping without manual intervention

vs alternatives

Faster than sequential scraping by parallelizing requests across 5-10 threads; more reliable than naive Tor scraping by implementing timeout protection and retry logic; more anonymous than direct HTTP scraping by routing all traffic through Tor and rotating user agents

structured osint report generation from raw dark web content

Medium confidence

Synthesizes raw scraped content, search results, and metadata into structured intelligence reports using LLM-powered summarization implemented in llm.py:generate_summary(). The system constructs a prompt containing the investigation query, filtered search results, and scraped page content, then uses the LLM to extract key findings, identify threat indicators (IOCs), and organize information into a structured report with sections like 'Threat Overview', 'Key Findings', 'Indicators of Compromise', and 'Recommendations'. The report is formatted as JSON or markdown for downstream consumption by SIEM systems, threat intelligence platforms, or human analysts.

Solves for

I want to automatically generate a structured threat intelligence report from 20+ dark web pages without manual synthesisI need to extract IOCs (IP addresses, domains, hashes) from unstructured dark web content and organize them by typeI want to produce a markdown/JSON report that can be directly imported into my threat intelligence platform

Best for

threat intelligence teams automating report generation from dark web investigations

SOC analysts needing quick summaries of dark web findings for incident response

security researchers publishing dark web threat landscape reports

Requires

Python 3.8+

LLM provider credentials (OpenAI, Anthropic, Google, or local Ollama)

llm.py module with generate_summary() function

Limitations

LLM summarization adds 5-15 second latency due to processing large content batches

Report quality depends on LLM model capability — smaller models (Ollama 7B) produce less structured/actionable reports

IOC extraction is not guaranteed to be complete — LLM may miss indicators if they're embedded in unusual formats or obfuscated

What makes it unique

Implements LLM-powered synthesis of heterogeneous dark web content (marketplace listings, forum posts, leaked data) into structured OSINT reports with explicit IOC extraction, rather than simple text summarization; integrates with the multi-provider LLM abstraction to allow report generation with different models and comparing output quality

vs alternatives

More actionable than generic summarization tools because it extracts structured IOCs and threat indicators; faster than manual report writing by automating synthesis of 20+ pages into a structured format; more flexible than template-based reporting by using LLM to adapt report structure to investigation context

multi-provider llm abstraction with unified interface

Medium confidence

Provides a pluggable abstraction layer for multiple LLM providers (OpenAI, Anthropic, Google, Ollama) implemented in llm_utils.py:get_llm(). The system uses a factory pattern to instantiate the appropriate LLM client based on environment variables or configuration, enabling seamless provider switching without modifying downstream code. Each provider is wrapped with a consistent interface supporting streaming responses, token counting, and error handling. Configuration is managed through environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.) and a config file, allowing users to specify model selection, temperature, and max tokens per provider.

Solves for

I want to switch between OpenAI, Anthropic, and Ollama without rewriting my investigation codeI need to compare query refinement quality across different LLM models (GPT-4 vs Claude vs Llama)I want to run Robin on-premises using Ollama without sending data to cloud LLM providers

Best for

OSINT teams evaluating different LLM providers for cost/quality tradeoffs

organizations with data residency requirements needing on-premises LLM execution (Ollama)

developers extending Robin with custom LLM providers

Requires

Python 3.8+

At least one LLM provider configured: OpenAI API key, Anthropic API key, Google API key, or local Ollama instance

llm_utils.py module with get_llm() function

Limitations

Provider-specific features are not exposed — advanced features (function calling, vision) are abstracted away

Error handling is generic — provider-specific errors (rate limits, quota exceeded) are normalized, losing diagnostic detail

Token counting is approximate — different providers count tokens differently, leading to inconsistent cost estimates

What makes it unique

Implements a unified factory pattern abstraction across four distinct LLM providers (OpenAI, Anthropic, Google, Ollama) with consistent interface for streaming, error handling, and configuration, rather than provider-specific client code scattered throughout the codebase; enables on-premises execution via Ollama while maintaining API compatibility with cloud providers

vs alternatives

More flexible than provider-locked tools (e.g., OpenAI-only OSINT tools) by supporting multiple providers; more maintainable than conditional provider logic throughout codebase by centralizing provider instantiation; enables cost optimization by allowing provider switching based on query complexity

six-stage investigation pipeline orchestration

Medium confidence

Orchestrates a complete dark web OSINT investigation workflow through a six-stage pipeline implemented in main.py:cli(). The pipeline sequentially executes: (1) LLM initialization, (2) query refinement, (3) multi-engine search, (4) result filtering, (5) content scraping, and (6) report generation. Each stage is implemented as a modular function with clear input/output contracts, enabling easy insertion of custom stages or modification of existing ones. The orchestration layer handles error propagation, logging, and progress reporting across stages, with optional checkpointing to resume interrupted investigations.

Solves for

I want to run a complete dark web investigation from query to report with a single commandI need to automate the entire OSINT workflow without manually chaining together search, scraping, and analysis stepsI want to insert custom processing logic (e.g., custom filtering, external API calls) into the investigation pipeline

Best for

OSINT teams automating repetitive investigation workflows

threat intelligence platforms integrating Robin as a dark web data source

developers building custom OSINT tools on top of Robin's pipeline

Requires

Python 3.8+

All prerequisites for individual stages: Tor service, LLM provider credentials, search engine access

main.py module with cli() function

Limitations

Pipeline is sequential — stages cannot run in parallel, limiting throughput for large-scale investigations

No built-in checkpointing — if a stage fails, the entire investigation must be restarted from the beginning

Stage outputs are tightly coupled — modifying one stage's output format may break downstream stages

What makes it unique

Implements a six-stage investigation pipeline with clear modular boundaries and unified orchestration in main.py, enabling easy extension and customization; integrates all Robin capabilities (query refinement, search, filtering, scraping, synthesis) into a cohesive workflow rather than exposing individual functions

vs alternatives

More comprehensive than single-purpose tools (e.g., search-only or scrape-only tools) by automating the entire investigation workflow; more maintainable than monolithic scripts by decomposing the pipeline into modular stages with clear contracts

dual-mode interface: cli and streamlit web ui

Medium confidence

Exposes the investigation pipeline through two distinct interfaces: command-line (main.py:cli()) for automation and scripting, and Streamlit web UI (ui.py) for interactive exploration. The CLI mode accepts query and model arguments, executes the investigation pipeline, and outputs results to stdout/files. The Streamlit UI provides a web dashboard with form inputs for query/model selection, real-time progress updates, and interactive result visualization. Both interfaces share the same underlying pipeline implementation, ensuring consistency while accommodating different user workflows (batch automation vs. interactive investigation).

Solves for

I want to run dark web investigations from the command line and integrate them into my automation scriptsI need a web interface for non-technical analysts to run investigations without command-line knowledgeI want to visualize dark web investigation results in an interactive dashboard

Best for

DevOps teams integrating Robin into CI/CD pipelines and automation workflows

SOC analysts using a web UI for interactive threat investigation

organizations with mixed technical/non-technical users needing both CLI and web access

Requires

Python 3.8+

For CLI: main.py module

For web UI: Streamlit library (pip install streamlit), ui.py module

Limitations

CLI and web UI are separate entry points — no unified session management across interfaces

Streamlit UI is single-threaded — concurrent investigations from multiple users will queue sequentially

Web UI does not persist investigation history — results are lost on page refresh unless explicitly saved

What makes it unique

Provides dual-mode interface (CLI + Streamlit web UI) with shared underlying pipeline implementation, enabling both automation and interactive workflows from a single codebase; Streamlit UI offers real-time progress updates and interactive result visualization rather than static output

vs alternatives

More accessible than CLI-only tools by providing a web UI for non-technical users; more flexible than web-only tools by supporting command-line automation and scripting; maintains consistency across interfaces by sharing the same pipeline implementation

tor socks5 proxy integration for anonymous network access

Medium confidence

Routes all HTTP traffic (search queries, content scraping) through a Tor SOCKS5 proxy running on 127.0.0.1:9050, implemented at the HTTP client level using the requests library with PySocks support. The system configures the Tor proxy globally for all outbound requests, ensuring that search engine queries and .onion site scraping are anonymized. Tor circuit failures are handled with retry logic, and user agents are rotated to avoid fingerprinting. The implementation assumes a local Tor service is running (typically via Docker or system package) and does not manage Tor lifecycle.

Solves for

I want to query dark web search engines without exposing my IP addressI need to scrape .onion sites anonymously to avoid detection and blockingI want to ensure all Robin traffic is routed through Tor for operational security

Best for

OSINT investigators conducting sensitive dark web investigations requiring anonymity

threat intelligence teams operating in jurisdictions with restricted internet access

security researchers studying dark web without exposing their identity

Requires

Python 3.8+

Tor service running on 127.0.0.1:9050 (SOCKS5 proxy)

requests library with PySocks support (pip install requests[socks])

Limitations

Tor routing adds 2-10 second latency per request due to circuit establishment

Tor exit nodes may be rate-limited or blocked by some dark web sites, causing request failures

Tor network congestion can cause unpredictable latency spikes and circuit failures

What makes it unique

Integrates Tor SOCKS5 proxy routing at the HTTP client level (requests library with PySocks) rather than system-level proxy configuration, enabling fine-grained control over which requests are routed through Tor; implements user agent rotation and retry logic to improve reliability on Tor network

vs alternatives

More reliable than system-level Tor proxy configuration by handling Tor-specific failures (circuit failures, exit node blocking) with retry logic; more flexible than VPN-based anonymity by enabling per-request circuit rotation and exit node selection

docker containerization with tor service bundling

Medium confidence

Packages Robin and its dependencies (Python, requests, BeautifulSoup, Streamlit) into a Docker image with an integrated Tor service, enabling single-command deployment without manual dependency installation. The Dockerfile installs Tor, configures SOCKS5 proxy on 127.0.0.1:9050, and starts both Tor and Robin services on container startup. Environment variables for LLM provider credentials are passed at runtime, allowing users to deploy the container without modifying the image. The Docker Compose configuration (if provided) orchestrates the Robin container with optional additional services (e.g., Redis for caching, PostgreSQL for result storage).

Solves for

I want to deploy Robin with a single docker run command without installing Python, Tor, and dependencies manuallyI need to run Robin in a containerized environment for isolation and reproducibilityI want to deploy Robin to cloud platforms (AWS, GCP, Azure) using Docker images

Best for

DevOps teams deploying Robin to Kubernetes or Docker Swarm clusters

organizations with containerization requirements for security/compliance

developers testing Robin in isolated environments without affecting system dependencies

Requires

Docker 20.10+ or Docker Desktop

Docker Compose 1.29+ (optional, for multi-container deployments)

LLM provider credentials passed as environment variables at runtime

Limitations

Docker image size is large (500MB+) due to Tor and Python dependencies, increasing deployment time

Tor service inside container may have reduced anonymity compared to system-level Tor (shared exit nodes across containers)

No built-in health checks — container may appear running but Tor service may be offline

What makes it unique

Bundles Tor service directly into Docker image rather than requiring external Tor service, simplifying deployment; uses environment variable injection for LLM credentials, enabling credential management without image rebuilds

vs alternatives

Simpler deployment than manual installation by bundling all dependencies; more portable than system-specific packages by using Docker; enables cloud deployment without infrastructure setup

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Related Artifactssharing capabilities

Artifacts that share capabilities with robin, ranked by overlap. Discovered automatically through the match graph.

MCP Server27

Web Search MCP

** - A server that provides local, full web search, summaries and page extration for use with Local LLMs.

multi-engine web search with automatic fallback cascading

1 shared capability

MCP Server41

firecrawl-mcp-server

🔥 Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.

web search with result ranking and snippet extraction

1 shared capability

MCP Server44

Tavily MCP Server

AI-optimized web search and content extraction via Tavily MCP.

real-time web search with llm-optimized result formatting

1 shared capability

Framework43

SearXNG

Privacy-respecting metasearch — 70+ engines, no tracking, self-hosted, JSON API for AI agents.

multi-engine result aggregation with deduplication

1 shared capability

MCP Server32

duckduckgo-mcp-server

A Model Context Protocol (MCP) server that provides web search capabilities through DuckDuckGo, with additional features for content fetching and parsing.

duckduckgo web search with llm-optimized result formatting

1 shared capability

Benchmark45

local-deep-research

Local Deep Research achieves ~95% on SimpleQA benchmark (tested with GPT-4.1-mini). Supports local and cloud LLMs (Ollama, Google, Anthropic, ...). Searches 10+ sources - arXiv, PubMed, web, and your private documents. Everything Local & Encrypted.

multi-source iterative research with llm-driven query refinement

1 shared capability

Best For

✓threat intelligence analysts automating repetitive query expansion
✓law enforcement investigators scaling dark web searches across multiple jurisdictions
✓security researchers tracking credential exposure campaigns with evolving naming conventions
✓OSINT investigators needing comprehensive dark web coverage without manual multi-engine queries
✓threat intelligence teams automating large-scale dark web monitoring across multiple search indices
✓security researchers comparing search engine indexing patterns across the dark web
✓DevOps teams deploying Robin to multiple environments with different configurations
✓organizations with security requirements for credential management (no hardcoded keys)

Known Limitations

⚠LLM-based refinement adds 2-5 second latency per query due to API round-trip
⚠Query expansion quality depends on LLM model capability; smaller models (Ollama 7B) produce less sophisticated synonyms than GPT-4
⚠No caching of refined queries — identical raw queries trigger redundant LLM calls if not deduplicated upstream
⚠Search engine availability is unpredictable — individual .onion search engines may be offline or rate-limited, causing partial result loss
⚠Concurrent requests to multiple search engines increase Tor exit node load and may trigger rate-limiting or IP bans
⚠Result normalization is lossy — search engine-specific metadata (relevance scores, date indexed) is discarded during aggregation

Requirements

Python 3.8+API credentials for at least one LLM provider (OpenAI API key, Anthropic API key, Google API key, or local Ollama instance on localhost:11434)llm.py module with refine_query() functionNetwork connectivity to LLM provider endpoints or local Ollama serviceTor service running on 127.0.0.1:9050 (SOCKS5 proxy)search.py module with get_search_results() functionrequests library configured with Tor proxyNetwork connectivity to at least 2-3 dark web search engines (Torch, Ahmia, Candle, etc.)

Input / Output

Accepts: plain text query string (e.g., 'ransomware payment tracking'), unstructured investigation notes, refined search query string, optional search engine whitelist/blacklist, environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, TOR_PROXY_ADDRESS, etc.), config file (YAML or JSON) with keys: model, timeout, concurrency, search_engines, etc., investigation query string, list of search result objects (url, title, snippet), optional relevance threshold (0.0-1.0), list of .onion URLs to scrape, optional timeout per request (seconds), optional concurrency level (number of threads), list of scraped page content (text), optional metadata (URLs, timestamps, sources), provider name string (openai, anthropic, google, ollama), model name string (gpt-4, claude-3-opus, gemini-pro, llama2, etc.), optional configuration dict with temperature, max_tokens, etc., optional model selection (openai, anthropic, google, ollama), optional configuration overrides (timeout, concurrency, relevance threshold), CLI: command-line arguments (query, model, optional config), Web UI: form inputs (query text field, model dropdown, optional advanced settings), HTTP requests (implicitly routed through Tor), optional Tor proxy address (default 127.0.0.1:9050), Docker image (apurvsinghgautam/robin or self-built), environment variables for LLM credentials (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.), optional volume mounts for configuration files or result storage

Produces: list of refined search query strings, structured JSON with original query, refinement rationale, and generated variants, list of dicts with keys: url (str), title (str), source_engine (str), snippet (str optional), raw HTML/JSON responses from individual search engines, parsed configuration dict used throughout Robin, validation errors if configuration is invalid, filtered list of search results ranked by relevance score, relevance scores per result (float 0.0-1.0), optional reasoning/explanation per result, list of dicts with keys: url (str), content (str), status_code (int), error (str optional), raw HTML content from successful requests, error messages for failed requests, structured JSON report with sections: threat_overview, key_findings, iocs, recommendations, markdown-formatted report for human review, extracted IOCs as structured list (ips, domains, hashes, etc.), LLM client instance with unified interface, provider-specific metadata (model name, context window, cost per token), structured OSINT report (JSON or markdown), intermediate artifacts (refined queries, search results, scraped content), execution logs with timing and error information, CLI: stdout text, JSON files, markdown reports, Web UI: interactive Streamlit components (text, tables, expandable sections), HTTP responses from .onion sites and dark web search engines, error messages for Tor circuit failures, running Docker container with Robin CLI/web UI accessible, container logs with investigation progress and errors

UnfragileRank

Adoption61%(30% weight)

Quality42%(20% weight)

Ecosystem60%(15% weight)

Match Graph25%(30% weight)

Freshness75%(5% weight)

UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.

Type: Repository

11 capabilities

Visit robin→

Repository Details

4,796

Stars

936

Forks

Python

Language

MIT

License

Topics

ai-tooldarkwebdarkweb-osintinvestigation-toolllm-poweredosintosint-tool

Last commit: Mar 31, 2026

About

AI-Powered Dark Web OSINT Tool

Alternatives to robin

IntelliCode46Extension

AI-assisted development

Compare →

GitHub Copilot Chat49Extension

AI chat features powered by Copilot

Compare →

GitHub Copilot48Extension

Your AI pair programmer

Compare →

Claude Code for VS Code48Extension

Claude Code for VS Code: Harness the power of Claude Code without leaving your IDE

Compare →

Are you the builder of robin?

Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.

Claim this artifact →Verification via email

Get the weekly brief

New tools, rising stars, and what's actually worth your time. No spam.

Data Sources

github

Looking for something else?

Search →

Capabilities11 decomposed

llm-powered query refinement for dark web search optimization

Medium confidence

Solves for

Best for

threat intelligence analysts automating repetitive query expansion

law enforcement investigators scaling dark web searches across multiple jurisdictions

security researchers tracking credential exposure campaigns with evolving naming conventions

Requires

Python 3.8+

API credentials for at least one LLM provider (OpenAI API key, Anthropic API key, Google API key, or local Ollama instance on localhost:11434)

llm.py module with refine_query() function

Limitations

LLM-based refinement adds 2-5 second latency per query due to API round-trip

Query expansion quality depends on LLM model capability; smaller models (Ollama 7B) produce less sophisticated synonyms than GPT-4

No caching of refined queries — identical raw queries trigger redundant LLM calls if not deduplicated upstream

What makes it unique

vs alternatives

multi-engine concurrent dark web search with result aggregation

Medium confidence

Solves for

Best for

OSINT investigators needing comprehensive dark web coverage without manual multi-engine queries

threat intelligence teams automating large-scale dark web monitoring across multiple search indices

security researchers comparing search engine indexing patterns across the dark web

Requires

Python 3.8+

Tor service running on 127.0.0.1:9050 (SOCKS5 proxy)

search.py module with get_search_results() function

Limitations

Search engine availability is unpredictable — individual .onion search engines may be offline or rate-limited, causing partial result loss

Concurrent requests to multiple search engines increase Tor exit node load and may trigger rate-limiting or IP bans

Result normalization is lossy — search engine-specific metadata (relevance scores, date indexed) is discarded during aggregation

What makes it unique

vs alternatives

configuration management via environment variables and config files

Medium confidence

Solves for

Best for

DevOps teams deploying Robin to multiple environments with different configurations

organizations with security requirements for credential management (no hardcoded keys)

developers customizing Robin behavior without modifying source code

Requires

Python 3.8+

Environment variables set in shell, Docker, or Kubernetes

Optional config file (YAML or JSON) in working directory or specified path

Limitations

No built-in validation of configuration values — invalid settings may cause runtime errors

Configuration is loaded once at startup — changes require container restart

No built-in secrets management — credentials are stored as plaintext environment variables (requires external secrets manager)

What makes it unique

vs alternatives

llm-based intelligent result filtering with relevance scoring

Medium confidence

Solves for

Best for

threat intelligence analysts processing large result sets (100+ pages) from dark web searches

automated OSINT pipelines that need to reduce downstream scraping load by filtering early

investigators with limited time who need to focus on the most relevant 10-20% of results

Requires

Python 3.8+

LLM provider credentials (OpenAI, Anthropic, Google, or local Ollama)

llm.py module with filter_results() function

Limitations

LLM filtering adds 3-8 second latency per batch of results due to API round-trip and token processing

Relevance scoring is subjective and model-dependent — GPT-4 may score differently than Claude or Ollama 7B

Batch size is limited by LLM context window (4K-8K tokens) — cannot efficiently filter 500+ results in a single call

What makes it unique

vs alternatives

tor-routed anonymous content scraping from .onion sites

Medium confidence

Solves for

Best for

OSINT investigators extracting marketplace listings, forum posts, and leaked data from .onion sites

threat intelligence teams automating large-scale dark web content harvesting for analysis

security researchers studying dark web marketplace dynamics and pricing patterns

Requires

Python 3.8+

Tor service running on 127.0.0.1:9050 (SOCKS5 proxy)

requests library with PySocks for SOCKS5 support

Limitations

Tor routing adds 2-10 second latency per request due to circuit establishment and exit node selection

Concurrent scraping may trigger rate-limiting or IP bans from .onion sites if too aggressive (recommend max 5-10 concurrent threads)

JavaScript-heavy .onion sites cannot be scraped — only static HTML is extracted; dynamic content requires Selenium/Playwright

What makes it unique

vs alternatives

structured osint report generation from raw dark web content

Medium confidence

Solves for

Best for

threat intelligence teams automating report generation from dark web investigations

SOC analysts needing quick summaries of dark web findings for incident response

security researchers publishing dark web threat landscape reports

Requires

Python 3.8+

LLM provider credentials (OpenAI, Anthropic, Google, or local Ollama)

llm.py module with generate_summary() function

Limitations

LLM summarization adds 5-15 second latency due to processing large content batches

Report quality depends on LLM model capability — smaller models (Ollama 7B) produce less structured/actionable reports

IOC extraction is not guaranteed to be complete — LLM may miss indicators if they're embedded in unusual formats or obfuscated

What makes it unique

vs alternatives

multi-provider llm abstraction with unified interface

Medium confidence

Solves for

Best for

OSINT teams evaluating different LLM providers for cost/quality tradeoffs

organizations with data residency requirements needing on-premises LLM execution (Ollama)

developers extending Robin with custom LLM providers

Requires

Python 3.8+

At least one LLM provider configured: OpenAI API key, Anthropic API key, Google API key, or local Ollama instance

llm_utils.py module with get_llm() function

Limitations

Provider-specific features are not exposed — advanced features (function calling, vision) are abstracted away

Error handling is generic — provider-specific errors (rate limits, quota exceeded) are normalized, losing diagnostic detail

Token counting is approximate — different providers count tokens differently, leading to inconsistent cost estimates

What makes it unique

vs alternatives

six-stage investigation pipeline orchestration

Medium confidence

Solves for

Best for

OSINT teams automating repetitive investigation workflows

threat intelligence platforms integrating Robin as a dark web data source

developers building custom OSINT tools on top of Robin's pipeline

Requires

Python 3.8+

All prerequisites for individual stages: Tor service, LLM provider credentials, search engine access

main.py module with cli() function

Limitations

Pipeline is sequential — stages cannot run in parallel, limiting throughput for large-scale investigations

No built-in checkpointing — if a stage fails, the entire investigation must be restarted from the beginning

Stage outputs are tightly coupled — modifying one stage's output format may break downstream stages

What makes it unique

vs alternatives

dual-mode interface: cli and streamlit web ui

Medium confidence

Solves for

Best for

DevOps teams integrating Robin into CI/CD pipelines and automation workflows

SOC analysts using a web UI for interactive threat investigation

organizations with mixed technical/non-technical users needing both CLI and web access

Requires

Python 3.8+

For CLI: main.py module

For web UI: Streamlit library (pip install streamlit), ui.py module

Limitations

CLI and web UI are separate entry points — no unified session management across interfaces

Streamlit UI is single-threaded — concurrent investigations from multiple users will queue sequentially

Web UI does not persist investigation history — results are lost on page refresh unless explicitly saved

What makes it unique

vs alternatives

tor socks5 proxy integration for anonymous network access

Medium confidence

Solves for

Best for

OSINT investigators conducting sensitive dark web investigations requiring anonymity

threat intelligence teams operating in jurisdictions with restricted internet access

security researchers studying dark web without exposing their identity

Requires

Python 3.8+

Tor service running on 127.0.0.1:9050 (SOCKS5 proxy)

requests library with PySocks support (pip install requests[socks])

Limitations

Tor routing adds 2-10 second latency per request due to circuit establishment

Tor exit nodes may be rate-limited or blocked by some dark web sites, causing request failures

Tor network congestion can cause unpredictable latency spikes and circuit failures

What makes it unique

vs alternatives

docker containerization with tor service bundling

Medium confidence

Solves for

Best for

DevOps teams deploying Robin to Kubernetes or Docker Swarm clusters

organizations with containerization requirements for security/compliance

developers testing Robin in isolated environments without affecting system dependencies

Requires

Docker 20.10+ or Docker Desktop

Docker Compose 1.29+ (optional, for multi-container deployments)

LLM provider credentials passed as environment variables at runtime

Limitations

Docker image size is large (500MB+) due to Tor and Python dependencies, increasing deployment time

Tor service inside container may have reduced anonymity compared to system-level Tor (shared exit nodes across containers)

No built-in health checks — container may appear running but Tor service may be offline

What makes it unique

vs alternatives

Simpler deployment than manual installation by bundling all dependencies; more portable than system-specific packages by using Docker; enables cloud deployment without infrastructure setup

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Alternatives to robin

IntelliCode46Extension

AI-assisted development

Compare →

GitHub Copilot Chat49Extension

AI chat features powered by Copilot

Compare →

GitHub Copilot48Extension

Your AI pair programmer

Compare →

Claude Code for VS Code48Extension

Claude Code for VS Code: Harness the power of Claude Code without leaving your IDE

Compare →

robin

Capabilities11 decomposed

llm-powered query refinement for dark web search optimization

multi-engine concurrent dark web search with result aggregation

configuration management via environment variables and config files

llm-based intelligent result filtering with relevance scoring

tor-routed anonymous content scraping from .onion sites

structured osint report generation from raw dark web content

multi-provider llm abstraction with unified interface

six-stage investigation pipeline orchestration

dual-mode interface: cli and streamlit web ui

tor socks5 proxy integration for anonymous network access

docker containerization with tor service bundling

Related Artifactssharing capabilities

Web Search MCP

firecrawl-mcp-server

Tavily MCP Server

SearXNG

duckduckgo-mcp-server

local-deep-research

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

Repository Details

About

Categories

Alternatives to robin

Are you the builder of robin?

Get the weekly brief

Data Sources

robin

Capabilities11 decomposed

llm-powered query refinement for dark web search optimization

multi-engine concurrent dark web search with result aggregation

configuration management via environment variables and config files

llm-based intelligent result filtering with relevance scoring

tor-routed anonymous content scraping from .onion sites

structured osint report generation from raw dark web content

multi-provider llm abstraction with unified interface

six-stage investigation pipeline orchestration

dual-mode interface: cli and streamlit web ui

tor socks5 proxy integration for anonymous network access

docker containerization with tor service bundling

Related Artifactssharing capabilities

Web Search MCP

firecrawl-mcp-server

Tavily MCP Server

SearXNG

duckduckgo-mcp-server

local-deep-research

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

Repository Details

About

Categories

Alternatives to robin

Are you the builder of robin?

Get the weekly brief

Data Sources