SQLite added an AGENTS.md file with a blunt rule for people pointing AI agents at the codebase: agentic code is not accepted, but high-quality reproducible bug reports can be useful. A small file, but a big signal for critical open source maintenance.
This is the grown-up answer to AI spam: do not ban everything, define what has value. Agent patch no, reproducible test yes. Maintainers protect time, quality and legal cleanliness at once.
IBM Research and Artificial Analysis released the first benchmark for enterprise IT agents in a realistic Kubernetes environment on 27 May 2026. The top model (Claude Opus 4.7) reached 47 %. No frontier model exceeded 50 %.
A frontier model at 47 % on SRE diagnostics is not a model failure. It is a hype failure. For anyone signing enterprise contracts for an AI agent in IT operations this year, these numbers are the first dose of reality.
Google Research presents a private analytics approach combining secure aggregation with TEEs for safer measurement of on-device AI.
This is less flashy than a new model, but more important for deployment. Somewhere in a user's pocket an AI system is running, and Google wants to know what it does without looking over their shoulder.
Last Week in AI #341 connects Musk losing against OpenAI, Gemini updates from IO 2026 and other AI market signals.
A crowded pinboard where a judge, Google product team and OpenAI researchers each pin their own note. There is no single grand thesis about the AI market behind it.
OpenAI, Thrive Holdings and Crete built Tax AI for more than 30 accounting firms. The pilot processed 7,000 returns, saves about one third of practitioner time and improved sharply within six weeks through a feedback loop powered by Codex.
The most important part is not tax form automation by itself, but the operating model. Tax AI turns real practitioner failures into evals and Codex tasks, so the product improves on the exact cases that slow firms down. That is a practical picture of agentic software: humans keep accountability, the system absorbs repeat work and the product team gets a faster path from failure to fix.