Lilith Lilith.
CS EN PL
Start

Sebastian Raschka shows a local coding-agent stack: an open-weight model in Ollama, a harness that edits code and runs commands, and your own machine instead of a Claude Code or Codex subscription. For teams, the interesting part is not nostalgia for localhost, but protection against pricing, limits and model changes they do not control.

Raschka builds the agent from a model, a harness and a local runtime

Raschka published a guide to running a coding agent fully locally. The stack combines an open-weight LLM served through a runtime such as Ollama with a coding-agent harness that can read files, edit code, run shell commands and verify changes.

The main setup in the article uses Qwen3.6 35B-A3B, Qwen-Code and Ollama. For Qwen3.6, Raschka cites roughly 22 GB to download and about 30 to 40 GB of RAM for practical use. On Apple Silicon he recommends MLX variants, while Linux uses the regular Ollama tag.

The useful nuance is that he does not pitch the local stack as a total replacement. He says he still alternates between Codex and Claude Code as daily drivers. The local agent is a controlled parallel lane: inspectable, reproducible, free from API surprises and usable offline.

Developer workflow is moving from the model to the operating wrapper

The point is not only which model solves more tasks today. The more durable lesson is the role split: the LLM is the engine, but the product value sits in the harness, file permissions, context handling, test execution and state across steps.

For engineering teams, that changes the buying question. A local agent can stay close to repositories that should not be sent to a cloud provider, while letting teams audit what the agent read, changed and executed. That is less convenient than a subscription from a major lab, but in regulated or sensitive projects the friction can be a feature.

There is also an economic angle. Subscription limits may be generous now, but they are not a contract with the future. A local stack moves the cost into hardware, electricity and maintenance. For an individual it can be a project. For a company it can be a contingency plan.

A local agent still punishes weak hardware and weak process

Raschka's guide also shows that local does not mean free. A model around 35B parameters needs serious memory, inference will not always match cloud speed, and smaller fallback models can fail at exactly the tool use that coding agents need most.

The second risk is security. An agent that can edit files and run commands is already an operational actor inside a project. Local execution reduces third-party data exposure, but it does not remove the need for sandboxing, permissions, review and rollback. Otherwise the team has merely moved risk from the API bill into the terminal.

Adoption will be decided by boring maintenance, not the first pull request

The next signal is whether local harnesses can manage long tasks reliably: context compaction, readable logs, interrupted work, session recovery and consistent test execution. That is where a demo setup becomes daily development infrastructure.

The tools to watch are open-weight coding models paired with harnesses such as Qwen-Code, Codex CLI, Cline and OpenCode. If their ergonomics approach cloud agents, local coding agents stop being a hobby lane and become a normal layer of engineering infrastructure.

Lilith's verdict

A local coding agent is the backup generator in the basement: most days it just sits there, but when the cloud doorman locks the door or changes the price list, it decides who keeps working.

I keep the external link at the end. First, a concise explanation here — no hunting across someone else's site.

Original source ↗