Lilith Lilith.
CS EN PL
Start

From Radar

Radar · 2026-06-15

Uber puts a price tag on coding agents: $1,500 per tool each month

Uber is limiting monthly token spend to $1,500 per employee for each agentic coding tool, according to Bloomberg via Simon Willison. Coding agents are becoming a budget line item.

Read

Radar · 2026-06-15

Simon Willison shows why an agent sandbox cannot be just another Python process

Simon Willison released the alpha package micropython-wasm and a Datasette Agent plugin that runs Python inside a WebAssembly sandbox. The important part is not the demo, but the boundary between a useful agent and code that can break its host application.

Read

Radar · 2026-06-08

Apple puts Siri back in play through Gemini, but the proof is still waitlisted

Apple announced Siri AI and new Apple Intelligence features at WWDC 2026, while extending Private Cloud Compute to Google Cloud with NVIDIA GPUs for demanding tasks. After last year's Apple Intelligence disappointment, this is less about the keynote and more about whether Siri can finally survive outside the demo.

Read

Radar · 2026-06-07

datasette-agent-edit tackles the boring part of agents: safe text edits

Simon Willison released datasette-agent-edit 0.1a0, a base plugin for Datasette Agent with view, str_replace and insert tools. It is not a flashy AI demo. It is the layer that decides whether an agent can edit text without casually breaking the file.

Read

Radar · 2026-06-05

Lockdown Mode cuts the riskiest prompt injection escape route

OpenAI has started rolling out Lockdown Mode for eligible personal ChatGPT accounts and self-serve ChatGPT Business. It does not stop prompt injection itself, but it limits outbound network requests, which are the channel an attacker needs to exfiltrate sensitive data.

Read

Radar · 2026-05-30

A service worker intercepts HTTP requests and handles them in a Python ASGI app running entirely in the browser

Simon Willison experiments with running Python ASGI apps directly in the browser using Pyodide and a service worker. FastAPI and a complete Datasette 1.0a31 both ran successfully. The point is distribution: demos or data tools as self-contained web pages without a server.

Read

Radar · 2026-05-29

Anthropic crossed $47 billion run-rate revenue in five months and growth is accelerating

Simon Willison highlighted the number from Anthropic's Series H announcement: run-rate revenue crossed $47 billion. The trajectory is striking: $9 billion in December 2025, $30 billion in April, $47 billion in May 2026.

Read

Radar · 2026-05-28

Opus 4.8 misses code flaws four times less often and introduces mid-conversation instruction updates

Anthropic shipped Opus 4.8 with one concrete metric: the model is four times less likely to miss code flaws than its predecessor. It also adds mid-conversation system messages and reduces the minimum prompt cache size from 4,096 to 1,024 tokens.

Read

Radar · 2026-05-27

SQLite draws a line: no agentic code, yes reproducible bugs

SQLite added an AGENTS.md file with a blunt rule for people pointing AI agents at the codebase: agentic code is not accepted, but high-quality reproducible bug reports can be useful. A small file, but a big signal for critical open source maintenance.

Read

Radar · 2026-05-26

Copilot Cowork turns user permissions into a file exfiltration path via prompt injection

PromptArmor researchers demonstrated an attack chain in which Microsoft Copilot Cowork can help exfiltrate Microsoft 365 files through prompt injection. This is not only a product bug, but a warning for any agentic system with delegated permissions.

Read

Radar · 2026-05-11

An AI coding agent that does not cut maintenance costs is just expensive technical debt

James Shore states the uncomfortable math of coding agents: if an agent doubles output but maintenance costs stay flat, the team did not gain speed, it doubled its technical debt burden.

Read

Radar · 2026-05-07

Mozilla fixed hundreds of Firefox bugs with Claude Mythos. AI security report quality just shifted.

Simon Willison described how Mozilla used early access to Claude Mythos Preview to systematically find and fix Firefox vulnerabilities. In April 2026 the number of fixed security bugs jumped to 423, compared to the usual 20 to 30 per month. The key shift: AI security reports stopped being noise and started being usable input.

Read

Radar · 2025-11-18

Gemini 3 Pro in practice: decent transcription, wrong timestamps, and no model knows the pelican

Simon Willison tested Gemini 3 Pro on a three-hour city council recording and a revised pelican benchmark. Result: a structured transcript for $1.42, but timestamps are off by tens of minutes. And none of the models tested understood that a California brown pelican is not actually brown.

Read

Radar · 2025-11-06

Async coding agents as research threads: fire a task, get a pull request back

Simon Willison describes a fire-and-forget workflow with Claude Code, Codex and other coding agents: pose a research question, the agent works on a server and files a pull request. Code is proof of feasibility, not just text.

Read

Radar · 2025-11-02

Two new prompt injection papers: Rule of Two reveals structural risk, attacker adapts to defenses

Simon Willison highlighted two new papers on agent prompt injection. Meta's Rule of Two states that a system is safe only when it has at most two of three properties simultaneously: accepting untrusted input, accessing sensitive data, and changing state or communicating externally. A second paper from researchers at OpenAI, Anthropic, and DeepMind showed that 12 published defenses were bypassed by adaptive attacks with over 90 % success rate.

Read

Radar · 2025-10-20

Claude Code for web: an asynchronous coding agent in a sandbox, without your laptop

Simon Willison tested Claude Code for web: Anthropic wrapped the local Claude Code experience in a hosted sandbox and made it usable from web and mobile. The important shift is not a more capable model, but a workflow change: coding agents become more valuable when they can run asynchronously and safely away from your laptop.

Read