Lilith Lilith.
CS EN PL
Start
2026-06-03
12:00 · source ↗

Wasmer shows Codex as leverage for small teams, not a magic compiler

OpenAI says Wasmer used Codex to build Edge.js in two weeks instead of an estimated year and accelerated development 10x to 20x. The stronger point is not the number. It is the shift in the developer role: less typing, more steering risky model work.

The story here is not Codex writing a runtime. It is a small team handing the model a shovel while still standing at the pit with a helmet, a measuring tape and the authority to say stop.

00:00 · source ↗

Reachy Mini gets MCP tools from Hugging Face Spaces

Hugging Face shows Reachy Mini calling MCP tools hosted in public Spaces. The interesting part is not a weather answer, but the split between the robot body and capabilities that can be shared and updated outside the app.

Forget the weather trick. The real moment comes when a small robot starts raising the question: who is allowed to put a new tool on the table and let it speak to the body?

2026-06-02
16:48 · source ↗

GitHub is preparing for a world where agents write commits at scale

The Latent Space interview with Kyle Daigle frames GitHub as a platform under pressure from agentic coding. The point is not another Copilot feature, but whether infrastructure built for human pace can absorb software produced by machines.

GitHub is no longer asking whether agents can write code. It is staring at a pull request queue where a tired maintainer has to tell which robotic coworker helped and which one dumped work on the desk.

2026-06-01
00:00 · source ↗

Search should not be a button. It should be programmable infrastructure for agents

Perplexity describes Search as Code: an architecture where an agent does not call one monolithic search engine, but assembles a retrieval pipeline as code. The point is not a nicer search API. It is control over how evidence is found, filtered and verified.

Search as Code is not another pretty name for web search. It is the moment an agent stops browsing results like a human and starts building its own investigation pipeline: candidates, filters, evidence and a bin for noise.

15:41 · source ↗

Video generation is moving from clip output to canvas agent

Latent Space frames xAI Grok Imagine, through an interview with Ethan He, as a move from one shot video generation toward video agents. The thesis will be proven less by demo quality than by whether the system can iterate through a whole creative task.

A video agent becomes interesting only when the human at the table stops being the prompt janitor. If every version has to be dragged out of the ditch by hand, it is still just a loud clip tool.

15:01 · source ↗

Opus 4.8 shows that behavior tuning is not a checklist of fixes

Zvi Mowshowitz reads Opus 4.8 through model welfare and argues that attempts to fix honesty, sycophancy and preference shaping can create new problems elsewhere. For teams deploying models, the reminder is that alignment is not a checklist.

A model upgrade is not changing a light bulb. It is a new colleague at the table: maybe more precise, maybe more cautious, but the whole team has to check whether it stopped speaking exactly when it should have spoken.

13:03 · source ↗

Open models win on cost, but frontier intelligence still sells at a premium

Nathan Lambert argues that open and closed models are improving on different economic curves. The real question is not open source ideology, but where companies will keep paying a premium for the best model.

Open versus closed is not a war here. It is a drier scene: the CFO staring at the token bill while an engineer points to a pull request that would otherwise sit for three days.

2026-05-30
21:02 · source ↗

A service worker intercepts HTTP requests and handles them in a Python ASGI app running entirely in the browser

Simon Willison experiments with running Python ASGI apps directly in the browser using Pyodide and a service worker. FastAPI and a complete Datasette 1.0a31 both ran successfully. The point is distribution: demos or data tools as self-contained web pages without a server.

This approach does not replace a server. It reduces friction between idea and demo: a Python app as a web page, no deploy, no account, no server infrastructure.

2026-05-29
01:23 · source ↗

Anthropic crossed $47 billion run-rate revenue in five months and growth is accelerating

Simon Willison highlighted the number from Anthropic's Series H announcement: run-rate revenue crossed $47 billion. The trajectory is striking: $9 billion in December 2025, $30 billion in April, $47 billion in May 2026.

A $47 billion run-rate is the ledger where enterprise customers see for the first time what automated work costs when nobody sets limits. Somewhere in those numbers there is probably one badly configured usage policy.

2026-05-28
23:59 · source ↗

Opus 4.8 misses code flaws four times less often and introduces mid-conversation instruction updates

Anthropic shipped Opus 4.8 with one concrete metric: the model is four times less likely to miss code flaws than its predecessor. It also adds mid-conversation system messages and reduces the minimum prompt cache size from 4,096 to 1,024 tokens.

Opus 4.8 did not arrive with a keynote effect, but with a receipt: four times fewer missed code flaws and a model that prefers silence over a confident wrong answer. That is exactly the kind of honesty worth $25 per million tokens.

20:58 · source ↗

Google wants agents to propose hypotheses and write experimental code instead of the scientist

At I/O 2026, Google Research showed Gemini for Science, ERA and Co-Scientist as systems where AI takes over research middle steps: literature review, writing code, iterating hypotheses. Risks of false certainty and vendor lock-in are substantial.

Google is not just giving scientists a smarter chatbot here. It wants to build a lab where the agent writes the protocol and the human still has to watch for an elegantly formulated mistake sitting on the bench.

18:41 · source ↗

Async agents receive a spec, work in an isolated VM and leave a pull request in the repository by morning

A Latent Space discussion with Cognition and OpenInspect frames coding agents as asynchronous workers: spec-to-PR workflows, full VMs, agent memory, and situations where a PM ships a code change without a developer. The shift is from synchronous chat to delegating an entire work cycle.

Chat was the training ground. The real change starts when an agent leaves a trace in the repository by morning that someone must accept or discard, and nobody knows exactly what it did during the night.

16:00 · source ↗

Data Formulator 0.7 tries to rebuild enterprise data analytics around AI agents

Microsoft Research released Data Formulator 0.7, an analytics workspace where AI agents assist with exploration, transformation and visualization of enterprise data. The key question is whether the agent handles messy, permissioned data outside the demo.

Data Formulator targets the point where a table turns into a decision. The agent promises to take over the data preparation work, but in enterprise it will only succeed when it handles data that is not clean and never was.

2026-05-27
23:44 · source ↗

SQLite draws a line: no agentic code, yes reproducible bugs

SQLite added an AGENTS.md file with a blunt rule for people pointing AI agents at the codebase: agentic code is not accepted, but high-quality reproducible bug reports can be useful. A small file, but a big signal for critical open source maintenance.

This is the grown-up answer to AI spam: do not ban everything, define what has value. Agent patch no, reproducible test yes. Maintainers protect time, quality and legal cleanliness at once.

17:20 · source ↗

ITBench-AA: frontier models score below 50 % on Kubernetes SRE diagnostics

IBM Research and Artificial Analysis released the first benchmark for enterprise IT agents in a realistic Kubernetes environment on 27 May 2026. The top model (Claude Opus 4.7) reached 47 %. No frontier model exceeded 50 %.

A frontier model at 47 % on SRE diagnostics is not a model failure. It is a hype failure. For anyone signing enterprise contracts for an AI agent in IT operations this year, these numbers are the first dose of reality.

07:00 · source ↗

Codex helps build self-improving tax agents

OpenAI, Thrive Holdings and Crete built Tax AI for more than 30 accounting firms. The pilot processed 7,000 returns, saves about one third of practitioner time and improved sharply within six weeks through a feedback loop powered by Codex.

The most important part is not tax form automation by itself, but the operating model. Tax AI turns real practitioner failures into evals and Codex tasks, so the product improves on the exact cases that slow firms down. That is a practical picture of agentic software: humans keep accountability, the system absorbs repeat work and the product team gets a faster path from failure to fix.