Async coding agents as research threads: fire a task, get a pull request back | Radar

Async coding agents are changing the rhythm of research work. Instead of sitting at an editor and switching between documentation, terminal and browser, you pose a question, let the agent work on a server, and get back to something else.

Willison fires 2-3 research projects a day and gets back pull requests

Simon Willison described a concrete workflow: agents like Claude Code, Codex Cloud, Google Jules and GitHub Copilot agent receive research tasks, work asynchronously on a server, and when finished, file a pull request into a dedicated GitHub repository. Willison estimates he kicks off 2-3 projects per day with minimal time investment.

Key setup details: separate repositories (one public, one private) reduce security risks. Agents have full network access to install dependencies and fetch data. A GitHub Workflow using GitHub Models automatically generates README summaries of new projects.

Concrete examples from his public simonw/research repository: a benchmark of seven Markdown libraries with generated charts, compiling a C extension for WebAssembly, ML-based tag suggestions via text classification, and running Python WebAssembly in Node.js.

Code as evidence, not just text

The point is not that the agent writes a nice description of a solution. The point is that runnable code is empirical proof of feasibility. Willison frames it precisely: code does not lie. If the agent wrote code that runs, you know it is possible.

For research and spike tasks this is a meaningful shift. Classic exploration runs serially: you translate the question into code yourself, run it, iterate. An agent processes it in parallel and the PR lets you decide what to actually integrate.

Proof of concept is the strength; production is a different discipline

An async agent is useful, but good results depend on writing clear briefs and knowing which tasks only need a proof of concept versus which need production-quality code. Spike and exploratory projects are a good fit; critical infrastructure with complex logic is not.

There is also a new working discipline: come to the PR, understand the result, decide what to integrate and what to discard. If you merge PRs blindly, the agent has just produced a pile of diffs that nobody owns.

The real test is whether this scales to harder tasks

Willison shows a working workflow on exploratory projects where the agent can fail without serious consequences. The interesting question is how this holds up on multi-step research tasks or on projects where the agent must maintain context across many files.

Watch the security boundaries too: shared repositories, network permissions and how the agent behaves with unexpected input are things that are tolerated in an exploratory context but not in production.

Lilith's verdict

Willison shows that an agent does not have to write production code to be useful. It just needs to come back with a PR that tells you whether something is feasible or not. That shift from an editor loop to an async research thread may be a bigger change than it looks.