storm
RepositoryFreeAn LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
Capabilities12 decomposed
perspective-guided multi-turn question generation for research
Medium confidenceGenerates research questions through simulated conversations between a Wikipedia writer and topic expert LLM agents, where questions are grounded in perspective discovery from similar existing articles rather than direct prompting. The system surveys related Wikipedia articles to extract diverse viewpoints, then uses these perspectives to guide the question-asking process, ensuring comprehensive topic coverage from multiple angles. This two-agent conversational approach with perspective injection produces more structured and comprehensive research directions than naive question generation.
Uses perspective discovery from existing articles to guide question generation rather than direct LLM prompting, implemented as a two-agent conversation (Wikipedia writer + topic expert) that grounds questions in retrieved reference patterns. This contrasts with naive question generation that lacks structural guidance from domain knowledge organization.
Produces more comprehensive and well-organized research questions than single-prompt approaches because it learns perspective structure from authoritative sources rather than relying on LLM priors alone.
hierarchical outline generation with citation anchoring
Medium confidenceGenerates multi-level article outlines (sections, subsections, key points) using collected research references, where each outline node is anchored to specific retrieved sources. The system structures the outline hierarchically to match Wikipedia article conventions, then maps each outline element to supporting citations from the knowledge curation phase. This enables the subsequent writing stage to generate text with proper in-line citations by maintaining explicit outline-to-source mappings throughout the generation pipeline.
Maintains explicit outline-to-source mappings throughout generation, enabling downstream article writing to produce citations without additional retrieval. The outline generation phase explicitly anchors each structural element to supporting references from the knowledge curation phase, creating a citation-aware outline rather than a generic structure.
Guarantees citation availability at write time because outline generation is citation-aware, whereas generic outline generators may create structures that lack source support.
batch article generation with pipeline orchestration
Medium confidenceOrchestrates the complete STORM pipeline (knowledge curation → outline generation → article writing → polishing) for batch processing of multiple topics, implemented through STORMWikiRunner that manages state, error handling, and progress tracking across pipeline stages. The system executes each stage sequentially for each topic, maintaining intermediate results and enabling resumption from failure points. This orchestration layer abstracts pipeline complexity and enables users to generate article collections without managing individual stage invocations.
Implements STORMWikiRunner that orchestrates the complete multi-stage pipeline (knowledge curation → outline → article → polish) with state management and error handling, enabling batch article generation without manual stage invocation. The runner maintains intermediate results and enables resumption from failure points.
Simplifies batch article generation compared to manual stage invocation because the runner handles pipeline orchestration, state management, and error handling transparently.
encoder-based semantic similarity for perspective discovery
Medium confidenceUses sentence encoders (embeddings) to compute semantic similarity between research questions and existing article content, enabling the system to discover relevant perspectives from similar articles without explicit keyword matching. The encoder system converts text to dense vector representations, enabling efficient similarity search across large article collections. This semantic approach discovers perspectives that keyword-based methods would miss, improving the diversity and relevance of research questions.
Uses sentence encoders to compute semantic similarity for perspective discovery, enabling the system to find relevant perspectives from similar articles based on meaning rather than keywords. This semantic approach discovers diverse perspectives that keyword matching would miss.
Discovers more diverse and relevant perspectives than keyword-based methods because semantic similarity captures meaning-level relationships rather than surface-level term overlap.
internet-grounded long-form article generation with inline citations
Medium confidenceGenerates full-length Wikipedia-style articles (2000+ words) by consuming hierarchical outlines and mapped citations, producing text with inline citations that reference specific retrieved sources. The system uses the outline structure to guide section-by-section generation, maintaining citation context from the outline-to-source mappings to ensure every claim references a specific source. This multi-stage approach (outline → section generation → citation insertion) produces coherent long-form content with proper attribution without requiring additional source retrieval during writing.
Generates long-form articles with inline citations by leveraging pre-computed outline-to-source mappings from the outline generation phase, eliminating the need for citation lookup during writing. The system maintains citation context throughout multi-section generation, enabling coherent long-form text with proper attribution without additional retrieval.
Produces properly cited long-form content more efficiently than retrieval-augmented generation approaches that re-fetch sources during writing, because citation mappings are pre-computed in the outline phase.
internet search integration with multi-source retrieval
Medium confidenceIntegrates with internet search APIs (Bing, Google, or custom) to retrieve relevant sources for research questions, implementing a retrieval module that handles query expansion, result ranking, and content extraction. The system executes search queries derived from research questions, collects results with metadata (URLs, snippets, relevance scores), and extracts full-text content from retrieved pages. This retrieval layer feeds the knowledge curation phase with grounded source material, enabling all downstream stages to operate on internet-sourced information.
Implements a pluggable retrieval module that abstracts search provider (Bing, Google, custom) and handles full-text extraction from retrieved pages, enabling the knowledge curation pipeline to operate on rich source content rather than search snippets alone. The retrieval layer maintains source metadata throughout the pipeline for citation purposes.
Provides richer source material than snippet-only search because it extracts full-text content from retrieved pages, enabling more comprehensive knowledge curation and citation accuracy.
knowledge base construction with dynamic concept organization
Medium confidenceBuilds and maintains a hierarchical knowledge base (mind map) that organizes collected information into a dynamic concept structure, implemented as the KnowledgeBase class that stores information as nested concepts with relationships. The system continuously reorganizes information as new sources are added, maintaining a shared conceptual space that reduces cognitive load during knowledge curation. This knowledge base serves as the source of truth for outline generation and article writing, enabling both automated and human-collaborative workflows to reference a consistent information structure.
Maintains a dynamic, reorganizable knowledge base that serves as a shared reference structure for both automated and human-collaborative workflows, implemented as a hierarchical concept map that evolves as new information is added. This contrasts with static information tables that don't reorganize or provide cognitive scaffolding for long research sessions.
Enables human-AI collaborative research more effectively than flat information tables because the hierarchical concept structure provides cognitive scaffolding and reduces information overload during extended curation sessions.
human-ai collaborative discourse with moderator coordination
Medium confidenceImplements a three-agent collaborative discourse protocol (Co-STORM) where human users, LLM expert agents, and a moderator agent participate in structured knowledge curation conversations. The moderator agent generates thought-provoking questions inspired by retrieved information not yet discussed, expert agents answer questions grounded in external sources and raise follow-up questions, and human users can observe passively or actively steer the conversation. The system maintains conversation history and the shared knowledge base, enabling the moderator to track discussed vs. undiscussed information and guide the discourse toward comprehensive coverage.
Implements a three-agent collaborative protocol with explicit moderator coordination that tracks discussed vs. undiscussed information and generates targeted follow-up questions, enabling human-AI research teams to maintain conversation coherence and comprehensive coverage. The moderator agent explicitly inspects the knowledge base to identify information gaps and guide the discourse.
Enables more comprehensive and coherent human-AI collaboration than simple chatbot interfaces because the moderator agent actively tracks coverage and generates targeted follow-up questions rather than passively responding to user input.
multi-provider language model abstraction with unified api
Medium confidenceProvides a unified language model interface (lm.py module) that abstracts multiple LLM providers (OpenAI, Anthropic, Ollama, local models) behind a common API, enabling seamless provider switching without pipeline code changes. The system handles provider-specific details (API authentication, request formatting, response parsing, token counting) and exposes standardized methods for completion, chat, and function calling. This abstraction layer enables users to swap providers based on cost, latency, or capability requirements without modifying the knowledge curation or article generation logic.
Provides a unified LLM interface that abstracts OpenAI, Anthropic, Ollama, and local models, enabling provider-agnostic pipeline code and seamless switching based on cost/latency/capability tradeoffs. The abstraction handles provider-specific details (authentication, request formatting, token counting) transparently.
Enables more flexible and cost-optimized deployments than single-provider systems because users can mix providers (e.g., GPT-4 for complex reasoning, Ollama for simple tasks) without code changes.
article polishing and fact-checking with iterative refinement
Medium confidenceImplements an optional polishing phase that refines generated articles through iterative LLM-based fact-checking and improvement, verifying claims against source material and improving clarity/coherence. The system re-examines article sections against their source citations, identifies unsupported claims or contradictions, and generates refined versions. This post-generation refinement improves article quality without requiring additional source retrieval, leveraging the citation mappings from earlier phases to validate factual accuracy.
Implements automated fact-checking by re-examining generated article claims against their source citations, identifying unsupported or contradictory statements without additional retrieval. The polishing phase leverages pre-computed citation mappings to validate factual accuracy efficiently.
Improves article quality more efficiently than manual editorial review because automated fact-checking identifies issues before human review, reducing editorial burden while maintaining accuracy.
streamlit-based interactive research interface
Medium confidenceProvides a web-based frontend (Streamlit demo) that enables non-technical users to run STORM and Co-STORM pipelines through an interactive UI, handling topic input, progress visualization, and result display. The interface abstracts pipeline complexity, manages LLM configuration, and presents results in readable formats (formatted articles, conversation transcripts, knowledge base visualizations). This frontend enables researchers and content creators to use STORM without writing code, lowering the barrier to entry for knowledge curation workflows.
Provides a Streamlit-based web interface that abstracts STORM pipeline complexity for non-technical users, handling LLM configuration, progress visualization, and result formatting without requiring code. The interface enables interactive research workflows while maintaining access to underlying pipeline capabilities.
Lowers the barrier to entry for STORM usage compared to programmatic APIs because non-technical users can run full research pipelines through a web interface without writing code.
structured data extraction and information table construction
Medium confidenceConstructs structured InformationTable objects that organize collected research data (sources, snippets, metadata) into queryable tables with schema-aware operations, enabling downstream stages to access information programmatically. The system extracts and structures information from retrieved sources, maintaining relationships between sources, concepts, and claims. This structured representation enables outline generation and article writing to query information efficiently without re-parsing raw source text.
Constructs schema-aware InformationTable objects that organize research data with explicit source-to-information mappings, enabling efficient programmatic access during downstream stages. The structured representation maintains relationships between sources, concepts, and claims rather than storing raw text.
Enables more efficient information access during article generation than raw text storage because structured tables support indexed queries and maintain explicit source relationships.
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with storm, ranked by overlap. Discovered automatically through the match graph.
STORM
Stanford research agent that writes Wikipedia-quality articles.
Caktus
Revolutionize content creation and data analysis with AI-driven precision and...
Squibler
Transform writing with AI, from blank page to printed book,...
Quriosity
AI-powered tool for rapid, high-quality content creation and...
*data-to-paper*
is a framework for systematically navigating the power of AI to perform complete end-to-end
RapidTextAI
Write Advance Articles using Multiple AI Models like GPT4, Gemini, Deepseek and grok.
Best For
- ✓researchers building automated knowledge synthesis systems
- ✓teams generating long-form content at scale with citation requirements
- ✓knowledge curation platforms needing multi-perspective coverage
- ✓automated content generation systems requiring citation integrity
- ✓knowledge bases building Wikipedia-like article collections
- ✓research platforms needing structured knowledge organization
- ✓knowledge base platforms generating article collections
- ✓content creation teams producing multiple articles
Known Limitations
- ⚠Perspective discovery requires access to similar existing articles (Wikipedia or equivalent), limiting effectiveness for novel/niche topics
- ⚠Multi-turn conversation overhead adds latency compared to single-prompt question generation
- ⚠Quality depends on availability of reference articles; sparse topic domains may yield limited perspectives
- ⚠Conversation context window constraints may limit question depth for very broad topics
- ⚠Outline quality depends entirely on research phase coverage; gaps in collected sources create outline gaps
- ⚠Hierarchical depth is constrained by LLM context window and citation density
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
Repository Details
Last commit: Sep 30, 2025
About
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
Categories
Alternatives to storm
Are you the builder of storm?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →