Auto-claude-code-research-in-sleep
MCP ServerFreeARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works with Claude Code, Codex, OpenClaw, or any LLM agent.
Capabilities12 decomposed
cross-model adversarial review loop with external llm verification
Medium confidenceImplements a two-model collaboration pattern where Claude Code executes research tasks (code generation, experiment design) while a separate external LLM (GPT-4, Claude, or configurable backend) reviews outputs independently via MCP protocol. The reviewer never sees the executor's reasoning, only final artifacts, forcing fresh evaluation and catching blind spots that single-model self-review misses. State is persisted across review cycles with checkpoint recovery.
Uses MCP-based model isolation to prevent single-model blind spots by forcing the reviewer to evaluate only final artifacts without access to executor reasoning. This mirrors adversarial vs. stochastic bandit strategies in ML theory, where the reviewer actively probes weaknesses the executor didn't anticipate. Most LLM research tools use self-review (Claude reviewing Claude); ARIS enforces architectural separation.
Outperforms single-model self-review systems (like native Claude Code) by catching methodological flaws that a single model would rationalize away; costs 2x inference but produces higher-quality research artifacts suitable for publication.
autonomous idea discovery and novelty validation against literature
Medium confidenceOrchestrates a multi-step workflow that generates novel ML research ideas by querying integrated literature sources (Zotero, Obsidian, arXiv, Semantic Scholar) to identify gaps, then validates novelty by cross-referencing recent papers and running lightweight pilot experiments. The system maintains a research wiki that tracks idea genealogy, related work, and experiment outcomes. Novelty scoring combines semantic similarity (embedding-based) and citation analysis.
Combines multi-source literature aggregation (Zotero + Obsidian + arXiv + Semantic Scholar) with embedding-based novelty scoring and lightweight pilot experiments in a single automated workflow. The research wiki maintains idea genealogy and tracks which ideas led to papers, enabling meta-analysis of research productivity. Most tools do literature search OR idea generation; ARIS closes the loop with novelty validation and outcome tracking.
Faster than manual literature review + brainstorming because it parallelizes idea generation with novelty checking; more rigorous than pure LLM idea generation because it grounds ideas in actual recent papers and validates with experiments.
integration with external research tools and data sources
Medium confidenceProvides adapters for popular research tools: Zotero (literature management), Obsidian (note-taking), Feishu/Lark (team notifications), arXiv/Semantic Scholar (paper discovery), and GPU infrastructure (SLURM, Kubernetes). Enables bidirectional sync (e.g., new papers in Zotero trigger idea discovery, paper acceptance triggers Feishu notification). Abstracts tool-specific APIs behind unified interfaces.
Provides unified adapters for popular research tools (Zotero, Obsidian, Feishu, arXiv, SLURM) with bidirectional sync. Enables workflows like 'new papers in Zotero trigger idea discovery' or 'paper acceptance triggers team notification'. Most research tools are isolated; ARIS integrates them into a cohesive ecosystem.
More integrated than point-to-point tool connections because it provides unified adapters and bidirectional sync; more flexible than monolithic research platforms because it works with existing tools researchers already use.
interactive mode with human-in-the-loop checkpoints
Medium confidenceSupports interactive execution where the system pauses at strategic checkpoints (after idea generation, after experiment results, before paper submission) and waits for human approval/feedback before proceeding. Enables researchers to review intermediate results, make manual adjustments, and guide the system toward desired outcomes. Supports both fully autonomous overnight mode and interactive mode.
Enables both fully autonomous overnight execution and interactive mode with human checkpoints at strategic points (idea approval, experiment selection, paper review). Supports flexible feedback mechanisms (approval, rejection, modifications). Most research tools are either fully autonomous or fully manual; ARIS bridges both modes.
More flexible than fully autonomous systems because it enables human oversight at critical decisions; more efficient than fully manual workflows because it automates routine tasks between checkpoints.
automated iterative experiment execution with ablation and result aggregation
Medium confidenceManages end-to-end experiment lifecycle: Claude Code generates experiment code (training loops, hyperparameter sweeps, evaluation scripts), executes them on GPU infrastructure, collects results (metrics, logs, checkpoints), aggregates findings into structured reports, and feeds results back to the reviewer for quality assessment. Supports checkpoint recovery if experiments timeout or fail mid-run. Integrates with GPU resource budgeting to prevent runaway costs.
Implements a stateful experiment pipeline with checkpoint-based recovery, resource budgeting, and automatic result aggregation into publication-ready tables. The system tracks experiment genealogy (which ablations led to which results) and enables meta-analysis of hyperparameter sensitivity. Most experiment frameworks (Ray Tune, Weights & Biases) focus on distributed training; ARIS focuses on sequential ablation studies with human-in-the-loop review.
Simpler than Ray Tune for single-GPU ablation studies because it doesn't require distributed setup; more integrated than W&B because it auto-generates paper tables and feeds results directly to the reviewer for quality assessment.
end-to-end paper generation with latex compilation and venue-specific formatting
Medium confidenceOrchestrates paper writing by generating LaTeX source code (sections, figures, tables, citations), compiling to PDF, detecting and fixing compilation errors, and formatting for target venues (NeurIPS, ICML, ICCV, etc.). Integrates experiment results directly into paper (auto-generates figure captions, embeds tables). Maintains LaTeX template library with venue-specific styles. Handles bibliography management via BibTeX.
Closes the loop from experiments to publication by auto-generating LaTeX, detecting and fixing compilation errors, and reformatting for multiple venues using a template library. The system embeds experiment results directly (auto-generated captions, tables) and maintains venue-specific formatting rules. Most paper-writing tools focus on content generation; ARIS handles the full LaTeX pipeline including compilation and error recovery.
Faster than manual LaTeX writing because it generates structure and embeds results automatically; more robust than raw Claude Code generation because it includes compilation error detection and venue-specific formatting rules.
rebuttal generation and reviewer concern parsing
Medium confidenceParses reviewer comments (from PDF or text), extracts concerns and questions, maps them to experiment results or paper sections, generates targeted rebuttals, and formats responses according to venue guidelines. Uses semantic matching to link reviewer concerns to relevant experiments or citations. Maintains rebuttal templates for common objection types (novelty, experimental rigor, clarity).
Automates the rebuttal pipeline by parsing reviewer concerns, mapping them to experiments via semantic matching, and generating targeted responses. Maintains rebuttal templates for common objection types and formats for multiple venues. Most tools focus on paper writing; ARIS extends to the revision cycle with concern-to-experiment traceability.
Faster than manual rebuttal writing because it auto-generates structure and links concerns to experiments; more systematic than ad-hoc responses because it ensures all concerns are addressed and mapped to evidence.
research wiki and meta-optimization for idea-to-paper tracking
Medium confidenceMaintains a persistent research wiki (markdown-based) that tracks idea genealogy, related work, experiment outcomes, and paper status. Enables meta-analysis of research productivity (which ideas led to papers, which experiments were most valuable, which venues accept which paper types). Supports automated meta-optimization: analyzing past research cycles to improve future idea generation, experiment selection, and writing strategies.
Implements a persistent research wiki that tracks idea-to-paper lineage and enables meta-analysis of research productivity. The meta-optimizer analyzes past cycles to recommend improvements (e.g., 'ideas in domain X have 60% acceptance rate, focus there'). Most research tools focus on single cycles; ARIS enables cross-cycle learning and continuous improvement.
Enables long-term research optimization that single-cycle tools cannot provide; helps researchers identify high-ROI research directions based on historical data rather than intuition.
mcp server architecture with multi-provider llm support
Medium confidenceImplements a Model Context Protocol (MCP) server that abstracts LLM provider differences (OpenAI, Anthropic, Ollama, local models) behind a unified interface. Supports both executor (Claude Code) and reviewer (configurable backend) roles. Handles API key management, rate limiting, token budgeting, and fallback strategies. Enables mix-and-match of models (e.g., Claude executor + GPT-4 reviewer + Ollama local validator).
Abstracts LLM provider differences behind MCP protocol, enabling seamless switching between OpenAI, Anthropic, Ollama, and custom endpoints. Supports asymmetric model selection (fast executor + slow reviewer) with unified token budgeting and rate limiting. Most research tools lock into a single provider; ARIS enables provider-agnostic research automation.
More flexible than provider-specific tools because it supports any MCP-compatible model; more cost-effective than single-provider systems because it enables mixing cheap and expensive models based on task requirements.
state persistence and checkpoint recovery for long-running workflows
Medium confidenceImplements a state management system that persists workflow state (current idea, experiment progress, paper draft, rebuttal status) to disk at regular intervals. Enables recovery from failures (network outages, GPU crashes, API rate limits) by resuming from the last checkpoint rather than restarting from scratch. Tracks state transitions and enables rollback to previous states if needed.
Implements fine-grained state checkpointing at each workflow stage (idea discovery, experiment execution, paper writing, rebuttal) with recovery and rollback capabilities. Tracks state transitions to enable analysis of which decisions led to success. Most research tools assume continuous execution; ARIS enables resilient overnight runs with graceful failure recovery.
More resilient than stateless tools because it recovers from mid-run failures without losing progress; more flexible than simple save/load because it enables rollback and state transition analysis.
skill-based workflow composition with markdown-only definitions
Medium confidenceOrganizes research capabilities as discrete, composable 'skills' defined in markdown files (no code framework required). Each skill specifies inputs, outputs, dependencies, and execution logic. Skills are composed into workflows (idea discovery → experiment → paper writing → rebuttal) using a simple orchestration language. Enables non-technical researchers to customize workflows by editing markdown without touching code.
Defines research capabilities as markdown-only skills with no framework lock-in. Skills are composable, shareable, and customizable without code changes. This enables non-technical researchers to build custom research pipelines and share methodologies as markdown files. Most research frameworks require code; ARIS uses markdown for accessibility.
More accessible than code-based frameworks because non-technical researchers can customize workflows by editing markdown; more flexible than rigid pipelines because skills can be reordered and combined in different ways.
resource budgeting and cost optimization for gpu experiments
Medium confidenceTracks GPU hours, API costs, and compute budgets across experiments. Estimates experiment cost before execution (based on model size, dataset, hyperparameters) and prevents runaway spending. Supports cost-aware experiment selection (e.g., 'run only experiments under $10'). Provides cost-per-paper metrics and recommendations for cost optimization (e.g., 'use smaller model for ablations').
Implements cost-aware experiment orchestration with pre-execution cost estimation, budget enforcement, and cost-per-paper metrics. Enables cost-optimized experiment selection (greedy algorithm to maximize value within budget). Most research tools ignore costs; ARIS makes cost optimization a first-class concern.
Prevents budget overruns that plague research teams with shared GPU infrastructure; enables cost-aware experiment selection that maximizes research output within budget constraints.
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with Auto-claude-code-research-in-sleep, ranked by overlap. Discovered automatically through the match graph.
CS11-711 Advanced Natural Language Processing
in Large Language Models.
Autoblocks AI
Elevate AI product development with seamless testing, integration, and...
Gito
AI code reviewer for GitHub Actions or local use, compatible with any LLM and integrated with...
local-deep-research
Local Deep Research achieves ~95% on SimpleQA benchmark (tested with GPT-4.1-mini). Supports local and cloud LLMs (Ollama, Google, Anthropic, ...). Searches 10+ sources - arXiv, PubMed, web, and your private documents. Everything Local & Encrypted.
Patronus AI
Enterprise LLM evaluation for hallucination and safety.
ReAct: Synergizing Reasoning and Acting in Language Models (ReAct)
* ⭐ 11/2022: [BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (BLOOM)](https://arxiv.org/abs/2211.05100)
Best For
- ✓ML researchers automating multi-day research cycles
- ✓teams running overnight experiments with cross-model validation
- ✓researchers who distrust single-model self-review and want adversarial collaboration
- ✓PhD students exploring research directions
- ✓ML researchers doing rapid idea validation before committing to experiments
- ✓teams running continuous research pipelines where ideas feed into experiments
- ✓teams using Zotero, Obsidian, and Feishu for research management
- ✓researchers with existing literature databases who want to integrate with ARIS
Known Limitations
- ⚠Requires two separate LLM API keys and incurs 2x inference costs per review cycle
- ⚠Reviewer latency adds ~30-60s per cycle; not suitable for real-time interactive workflows
- ⚠Cross-model disagreement resolution requires human intervention or meta-optimizer heuristics
- ⚠No built-in consensus mechanism if reviewer and executor fundamentally disagree on approach
- ⚠Novelty detection relies on embedding similarity and citation counts; cannot detect concurrent work submitted to arXiv in the last 48 hours
- ⚠Pilot experiments are lightweight and may miss subtle failure modes that full-scale experiments would catch
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
Repository Details
Last commit: Apr 21, 2026
About
ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works with Claude Code, Codex, OpenClaw, or any LLM agent.
Categories
Alternatives to Auto-claude-code-research-in-sleep
Are you the builder of Auto-claude-code-research-in-sleep?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →