Codex Local brings powerful language models directly into VSCode, running entirely on your machine. Write code without leaving your IDE. No API keys. No internet dependency. Pure private intelligence.
No subscriptions. No per-token billing. Install once, use forever on hardware you control. Open-source inference engines keep costs at zero.
Models run locally on your GPU or CPU. Code never leaves your machine. No telemetry, no model training on your data, no vendor lock-in.
No internet required after setup. Aircraft, trains, coffee shops with no WiFi, or intentionally disconnected dev environments. Code completions always available.
Use CodeLlama for speed. Switch to Mistral for accuracy. Drop in newer models as they release. Choose the inference engine: llama.cpp, vLLM, or Ollama.
Familiar inline suggestions, multi-line completions, and chat interface. The UX you expect. The speed that local compute provides (sub-100ms completions).
Download the extension, point it at your model server, start typing. No complex configuration. Works with Ollama, vLLM, or text-generation-webui out of the box.
Add Codex Local from the VSCode marketplace. Configure your local model server endpoint (defaults to localhost:5000).
Use Ollama, vLLM, or any OpenAI-compatible inference API. We recommend CodeLlama 13B or Mistral 7B for the best balance of speed and quality.
Begin typing. Codex Local streams completions directly from your GPU. No roundtrip to cloud APIs. Latency in milliseconds, not seconds.
Ask refactoring questions or debug issues with an in-IDE chat panel backed by your local model. Context-aware, private, instantaneous.
Installation (MacOS / Linux with Ollama):
Estimated system requirements:
GPU-Accelerated (Fast)
NVIDIA RTX 3060+ (6GB VRAM) or equivalent. Completions in 50-100ms.
CPU-Only (Usable)
16GB RAM, modern CPU. Completions in 500ms-2s. Fine for typing speed.
No network hop. Inference on your GPU or CPU means completions appear in your editor before your finger leaves the keyboard. Cloud APIs cannot compete with local inference speed.
Your code is not sent to any cloud service. No vendor logs. No model training on your proprietary logic. Full audit trail: you control the data flow.
Airplane mode, tunnel, café without WiFi, intentional offline work. Your AI coding assistant never goes down because it lives on your machine.
No surprise charges for heavy usage. Heavy refactoring session? Run 10,000 completions? Zero cost. Buy hardware once, amortize over years.
Not locked into one vendor's model. CodeLlama, Mistral, neural-chat, or proprietary models you fine-tune. Swap whenever a better model ships.
If your AI provider raises prices, discontinues service, or changes terms, you still code. Your development workflow survives vendor dynamics.
Codex Local is open-source and free. Install from VSCode Marketplace. Run models locally. Get completions with zero subscription.
The Wishdeal Factory scores every idea against 10 Adoptability axes, separate from raw quality. Here are the numbers we surface for this one.
Everything on this page. The brand, the score, the Fermi math, the audio pitch.
ICP, MVP scope, first 7 build tasks, 30/60/90 launch plan, GTM, email drip, LinkedIn message, objections, risk memo.
Unlock dossierDossier plus the working code starter, brand assets, copy library, and outreach pack.
See adopt scopeHire the team that built this to install, customize, and run launch with you.
See scope