llm-vscode
ExtensionFreeLLM powered development for VS Code
Capabilities8 decomposed
context-aware inline code completion with ghost-text ui
Medium confidenceGenerates code suggestions in real-time as developers type by sending the current file's prefix and suffix context (relative to cursor position) to a configurable LLM backend (Hugging Face Inference API, Ollama, OpenAI, or TGI). The extension automatically tokenizes input using the tokenizers library to fit within the model's context window, constructs a prompt with special tokens (start_token, end_token, middle_token), and renders completions as ghost-text overlays matching VS Code's native completion UI pattern. Supports multiple model backends without leaving the editor.
Supports 4 distinct backend types (Hugging Face Inference API, Ollama, OpenAI-compatible, TGI) with automatic context window fitting via tokenizers library, allowing developers to switch between cloud and local inference without reconfiguring the extension. Default model (bigcode/starcoder) is open-source, avoiding vendor lock-in.
Offers more backend flexibility than GitHub Copilot (cloud-only) and better local inference support than Tabnine (which primarily uses cloud), while remaining free for open-source models.
code attribution checking via bloom filter matching against the stack dataset
Medium confidenceDetects whether generated code matches sequences in The Stack training dataset by performing a rapid first-pass Bloom filter lookup against a pre-built index, then optionally linking to stack.dataportraits.org for detailed attribution verification. The extension requires a minimum 50-character code sequence and sufficient surrounding context to perform matching. Triggered via the 'Cmd+Shift+A' keyboard shortcut or command palette. Uses probabilistic matching (Bloom filter) for speed, with acknowledged false positives.
Integrates Bloom filter-based probabilistic matching against The Stack dataset directly into the VS Code editor workflow, providing real-time attribution checking without requiring external tools or manual searches. Acknowledges false positives transparently and links to detailed verification.
Provides training data attribution checking that GitHub Copilot does not expose, and integrates it directly into the editor rather than requiring separate tools like the Stack search interface.
multi-backend model switching with unified configuration
Medium confidenceAllows developers to select and switch between 4 different LLM backend types (Hugging Face Inference API, Ollama, OpenAI-compatible, Text Generation Inference) via VS Code settings without modifying code or restarting the extension. Each backend has configurable parameters: base URL, model ID, and custom request body JSON. The extension constructs HTTP POST requests with backend-specific URL patterns and forwards the configured requestBody to the selected endpoint. Supports automatic token counting to fit prompts within each model's context window.
Provides unified configuration for 4 distinct backend types with automatic context window fitting, allowing developers to switch between cloud (Hugging Face, OpenAI) and local inference (Ollama, TGI) without code changes. Default backend uses open-source StarCoder model, avoiding vendor lock-in.
Offers more backend flexibility than GitHub Copilot (cloud-only) and Tabnine (primarily cloud), while supporting both commercial APIs and fully local inference in a single extension.
automatic context window fitting with tokenizer-based prompt truncation
Medium confidenceAutomatically measures and fits the code completion prompt within each model's context window by using the tokenizers library to count tokens in the prefix, suffix, and surrounding code. If the combined prompt exceeds the model's maximum context length, the extension truncates the prefix and/or suffix to fit. This ensures requests succeed without manual context management by the developer. Token counting happens per-request with computational overhead.
Uses tokenizers library for accurate token counting across multiple model types, automatically truncating context to fit within each backend's limits without requiring manual configuration or developer intervention.
Provides automatic context fitting that GitHub Copilot handles internally (opaque to users), while making it explicit and configurable for self-hosted backends like Ollama and TGI.
vs code command palette and keyboard shortcut integration
Medium confidenceExposes core extension functionality through VS Code's command palette (Cmd/Ctrl+Shift+P) and dedicated keyboard shortcuts. Documented commands include 'Llm: Login' for authentication and 'Llm: Code Attribution Check' (Cmd+Shift+A). The extension registers these commands with VS Code's command registry, making them discoverable and remappable. Additional commands exist but are not enumerated in available documentation.
Integrates with VS Code's native command palette and keybinding system, allowing developers to discover and customize extension commands without leaving the editor. Supports remappable shortcuts (Cmd+Shift+A for attribution checks).
Provides standard VS Code integration patterns that match native editor workflows, unlike some extensions that rely on custom UI panels or external tools.
hugging face api token management with auto-detection and manual entry
Medium confidenceManages Hugging Face API authentication by automatically detecting tokens from the huggingface-cli cache on disk (if huggingface-cli was previously configured) or accepting manual token entry via the 'Llm: Login' command. Tokens are stored in VS Code's secure credential storage (mechanism not specified). The extension validates tokens before making API requests to the Hugging Face Inference API. Tokens can be obtained from hf.co/settings/token.
Automatically detects and reuses Hugging Face CLI tokens from disk cache, reducing friction for developers already using Hugging Face tools. Falls back to manual entry via 'Llm: Login' command if auto-detection fails.
Simpler authentication flow than GitHub Copilot (which requires GitHub OAuth) and more flexible than Tabnine (which requires account creation in extension UI).
vs code settings panel configuration with llm filter
Medium confidenceExposes extension configuration through VS Code's standard settings UI (Cmd+, → filter 'Llm'). Developers can configure backend type, model ID, base URLs, request body parameters, and other options via a searchable settings panel. The full list of available configuration options is not enumerated in documentation. Settings are persisted in VS Code's configuration store and applied immediately or after extension reload.
Integrates with VS Code's native settings UI and search, allowing configuration through the standard editor settings panel rather than custom dialogs or JSON files.
Provides standard VS Code configuration patterns that match native editor workflows, unlike extensions with custom configuration dialogs or external configuration files.
inline code completion rendering with ghost-text ui pattern
Medium confidenceRenders generated code completions as ghost-text overlays in the editor, matching VS Code's native code completion UI pattern. The extension inserts completions at the cursor position when accepted (typically via Tab or Enter key). Ghost-text appears in a dimmed color to distinguish it from actual code. The rendering is handled by VS Code's InlineCompletionItemProvider API (or similar completion API).
Uses VS Code's native InlineCompletionItemProvider API to render completions as ghost-text, providing a familiar UX that matches VS Code's built-in completion behavior without custom UI.
Matches VS Code's native completion UX more closely than GitHub Copilot's dropdown-based suggestions, and simpler than custom completion panels used by some extensions.
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with llm-vscode, ranked by overlap. Discovered automatically through the match graph.
Claude 4, DeepSeek R1, ChatGPT, Copilot, Cursor AI and Cline, AI Agents, AI Copilot, and Debugger, Code Assistants, Code Chat, Code Completion, Code Generator, Autocomplete, Codestral, Generative AI
Bugzi: Multi-Agent AI and Code Scanning. Your AI Partner for Development. Bugzi is a powerful AI assistant that seamlessly integrates into your VS Code workflow, designed to enhance productivity and streamline your entire development process. While Bugzi includes a realtime security scanner to prote
文心快码 Baidu Comate
Coding mate, Pair you create. Your AI Coding Assistant with Autocomplete & Chat for Java, Go, JS, Python & more
Ghostwriter
An AI-powered pair programmer by...
Supermaven
The fastest copilot.
Augment Code (Nightly)
Augment Code is the AI coding platform for VS Code, built for large, complex codebases. Powered by an industry-leading context engine, our Coding Agent understands your entire codebase — architecture, dependencies, and legacy code.
Lingma - Alibaba Cloud AI Coding Assistant
Type Less, Code More
Best For
- ✓Solo developers and small teams using open-source LLMs
- ✓Developers preferring local inference (Ollama) over cloud APIs
- ✓Teams evaluating Hugging Face models for code generation
- ✓Developers wanting cost-controlled completion via self-hosted TGI
- ✓Developers concerned about training data contamination and code provenance
- ✓Teams with strict IP policies requiring attribution verification
- ✓Open-source maintainers auditing generated code for licensing compliance
- ✓Researchers studying code generation model behavior and training data leakage
Known Limitations
- ⚠No multi-file context awareness — only current file prefix/suffix is sent to the model
- ⚠Context window automatically truncated to fit model limits, potentially losing surrounding code
- ⚠Network latency from HTTP requests to external backends (Inference API, Ollama) adds completion delay
- ⚠Free tier Hugging Face Inference API has rate limits; PRO plan recommended for production use
- ⚠No streaming response support documented — full completion must be generated before display
- ⚠Tokenization overhead via tokenizers library adds computational cost per completion request
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
About
LLM powered development for VS Code
Categories
Alternatives to llm-vscode
Are you the builder of llm-vscode?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →