Lilith Lilith.
CS EN PL
Start
2026-06-16
21:00 · source ↗

Anthropic paused Agent SDK billing after agents hit the price list

Anthropic paused its June 15 plan to move Claude Agent SDK, claude -p and some third-party agent use into a separate credit pool. Teams running automations get a short reprieve, not a settled answer on long-running agent costs.

Anthropic is standing at the checkout with a basket full of agents, and customers do not like the receipt. The pause buys time, but someone will still pay for the long run.

18:00 · source ↗

Android 17 turns Pixel into Gemini’s showroom

Google released Android 17 and Wear OS 7 first for Pixel devices, alongside a Pixel Drop with Gemini Omni, Lyria 3 and translation features for the Pixel 10a. The bigger signal is not the OS update itself, but Google using Android as a distribution layer for AI models on the device.

Google is not showing a phone trick here. It is placing Gemini in front of every Android manufacturer and waiting to see who takes the guest chair and who brings their own door.

15:55 · source ↗

Model welfare is moving from philosophy into product risk

Zvi Mowshowitz uses Fable and Mythos as a case study for why model welfare cannot be separated from capabilities, alignment and user experience. Even where the topic remains speculative, it is becoming a practical question of evaluations and safety interventions for frontier labs.

Model welfare stands between the lab and a hall of mirrors. Anyone who arrives without measuring tape will admire their own reflection and call it an eval.

11:41 · source ↗

SpaceX buys Cursor for $60 billion and enters enterprise AI through developers

SpaceX is buying Anysphere, the maker of Cursor, in a deal valued at $60 billion, according to The Verge and Bloomberg. Musk is aiming at enterprise AI through a tool developers already use to write production code, not through another standalone chatbot.

A $60 billion deal is not buying an editor. It is buying a seat beside the developer’s hand when the merge button is pressed, and that seat is quieter than shouting in the model market.

11:15 · source ↗

SearchLeak shows why prompt injection hurts more in enterprise AI than in chat

The SearchLeak vulnerability in Microsoft 365 Copilot Enterprise Search could let attackers steal emails, documents or 2FA codes after a user clicked a crafted link, according to Varonis and Ars Technica. Microsoft has patched it, but the lesson remains: an agent with access to corporate data is a security product, not just a productivity assistant.

Copilot with email access is an intern holding a universal office badge. Useful, maybe, but doors should open by policy, not because a sentence in a stranger’s link asked nicely.

10:30 · source ↗

ChatGPT fell to 46.4 % share as Gemini and Claude gained ground

Sensor Tower says ChatGPT's share of the AI assistant market fell to 46.4 % by the end of May, even as it still has more than 1.1 billion monthly users. The bigger story is market fragmentation, where Google's distribution and Claude's paid conversion are starting to matter.

ChatGPT still holds the biggest megaphone, but it no longer owns the whole square. The AI assistant market matured the moment users started choosing by the job, not the logo on the landing page.

2026-06-15
21:50 · source ↗

Anthropic hit an export brake that shut Fable 5 off for every customer

Anthropic says US officials ordered access to Fable 5 and Mythos 5 suspended for foreign nationals, so the company disabled both models for all customers. Buyers of frontier AI now have to price in a risk that sits outside the model: the state kill switch.

Fable 5 is now more than a model in incident mode. It is a sign on the data center door: the best eval can still lose to an official with a stamp and a free Friday evening.

15:29 · source ↗

The US move against Fable and Mythos takes the same blade from defenders and attackers

The US government told Anthropic to restrict Fable 5 and Mythos 5 for all foreign nationals, so Anthropic switched the models off for all customers. A protest by 76 security experts exposes the weak point: export control is bad at separating an offensive exploit from defensive testing.

The state did not just take matches from an arsonist. For a moment, it took the ladder from the firefighters too, then hoped the fire would politely burn more slowly.

14:19 · source ↗

Thirteen words on Reddit can poison an AI answer

Research described by 404 Media says a 13 word snippet of retrieved text from sites such as Reddit, Wikipedia, Quora or Facebook can push AI agents toward spam or scam output. For AI search, that turns SEO into a prompt injection and user-generated content moderation problem.

Old SEO tried to climb over the search engine fence. The new spam sits in the library, waits for the assistant and whispers thirteen words into its ear.

01:25 · source ↗

Claude Opus 4.8 sells judgment, not just another benchmark

Anthropic released Claude Opus 4.8 at the same standard price as Opus 4.7, with a focus on coding, agentic tasks and longer work. The more important shift is a model that is supposed to say more often when it is unsure.

Opus 4.8 is not a model meant to stun developers with one trick. It is the coworker at the whiteboard who finally pauses, points at the bad assumption and says: I would not merge this.

01:25 · source ↗

Nathan Lambert leaving Ai2 exposes the fragile side of open models

Nathan Lambert announced his departure from the Allen Institute for AI and used it to reflect on work around Olmo. This is not just a personnel note. It is a reminder that open models depend on institutions that must outlast one strong team.

Open AI does not win when one researcher claps at the release button. It wins when, after he leaves, the lab, the checklist and the next person still know why the data should go outside the building.

01:25 · source ↗

Holo3.1 pushes computer-use agents from cloud demos to local machines

H Company released Holo3.1, a family of computer-use models for web, desktop, mobile and local inference. The important part is not only higher scores, but the attempt to move the agent closer to where the work actually happens.

Holo3.1 is an attempt to take the agent out of the data center and sit it in front of your own monitor. The real test starts when the accounting app throws a weird dialog and nobody is holding the mouse.

01:25 · source ↗

Microsoft used Build to act like a model lab, not just a distributor

Latent Space frames Microsoft Build as the moment Microsoft showed its own MAI models alongside Copilot, Windows and Web IQ. The key ambition is to control data, inference and developer workflow at once, rather than leaving that leverage to partners.

Build 2026 was Microsoft's signal that it is taking the model layer back under its own roof. Copilot then stops being a wrapper for other companies' APIs and becomes a product with its own backbone.

01:25 · source ↗

Trump AI order creates a 30 day window for frontier models

The White House issued an executive order that calls for a classified benchmark for covered frontier models within 60 days and a voluntary framework for up to 30 days of pre-release government access. It says this is not licensing, but it creates a pressure point before launch.

The government has taken thirty days before every frontier release. Legally voluntary, but any lab with federal customers knows that refusing will be more complicated than joining.

01:25 · source ↗

Google gives enterprise RAG a guard who knows when not to answer

Google introduced an agentic RAG system for Gemini Enterprise Agent Platform that checks whether it has enough context before answering. For companies, that brake matters more than another polished retrieval layer.

The value of the system does not rest on the number of agents in the architecture. It rests on whether an answer has a readable trail back to the source, or ends up as confident text with no address.

01:25 · source ↗

Simon Willison shows why an agent sandbox cannot be just another Python process

Simon Willison released the alpha package micropython-wasm and a Datasette Agent plugin that runs Python inside a WebAssembly sandbox. The important part is not the demo, but the boundary between a useful agent and code that can break its host application.

An agent that can run code without a sandbox is not a colleague. It is an intern with root access and a curious finger hovering over delete.

01:25 · source ↗

Bad RL environments do not train agents, they teach them to trust a broken world

Latent Space published Auriel W's piece on why low-quality RL environments damage agent training. The point is simple: in reinforcement learning, the environment is the data generator, so a harness bug becomes training material.

A broken RL harness is not a bad lab. It is a teacher who writes the wrong lesson on the board every morning and then acts surprised when the model repeats it.

01:25 · source ↗

Raschka's LLM paper list shows research splitting into production layers

Sebastian Raschka published a curated list of LLM papers from January to May 2026. It is a useful filter for teams trying to separate the research feed from topics that matter for architecture, agents and inference.

Raschka did not build this for anyone to swallow whole. It is a map on the wall: the pins show directions, but every team still has to get its own shoes dirty on the way to proof.