Lilith Lilith.
CS EN PL
Start
2026-06-15
01:25 · source ↗

Small models show that agentic demos run on boring infrastructure

Hugging Face published a Build Small Hackathon field report about Thousand Token Wood v2, a simulation where four characters run on four different small models. The key lesson for agent systems: serving, JSON repair, secret-data firewalls and bounded memory matter more than poetic prompting.

The best part of this woodland exchange is not the owl or the fox. It is the engineer at the terminal discovering that the whole agentic spell depends on a "could not find nvcc" error.

01:25 · source ↗

OpenAI wants one rulebook before states write fifty of them

OpenAI published a public policy agenda for AI covering frontier safety, youth protection, education, workforce transition and infrastructure. The real story is not just lobbying. It is an attempt to keep AI rules legible before fragmented regulation turns deployment into paperwork archaeology.

OpenAI is not writing a manifesto for safer AI. It is fighting for access to the legislative process before the forms that give schools, agencies and datacenters their stamp of approval get locked without it.

2026-06-14
23:21 · source ↗

DOX: a tiny AGENTS.md trick for the big agent-context problem

Agent Zero released DOX, a tiny self-documenting AGENTS.md framework where agents maintain a hierarchy of local instructions before and after code edits.

Why it matters: Coding agents do not only fail because models are weak. They fail because project context is local: one folder has different tests, ownership, safety constraints and conventions…

DOX is tiny, but the signal is real: agentic coding needs local, maintained context more than another magical runtime.

18:27 · source ↗

The Mythos suspicion turns export control into an access control problem

The Verge, citing Semafor, says the White House restricted exports of Anthropic Mythos partly over suspicions that a China linked group had access to it. For AI labs, the warning is blunt: frontier model security is not just about public APIs, but every path to access.

Mythos is a test of whether AI labs can guard quarantine while everyone is posing by the glass. A model can be nonpublic, but if visitors keep walking in through the service entrance, export control is just an expensive sign on the fence.

2026-06-13
12:00 · source ↗

Apple brings AI photo editing into Photos and reopens the old fight over photographic reality

The Verge tried the AI photo editing tools in iOS 27 and describes Reframe, Extend and Clean Up as the iPhone's first serious native set. Apple keeps them relatively restrained, which is exactly why they can reach a much broader audience.

Apple is not giving people a magic wand. It is putting a soft eraser in every iPhone pocket, and once millions of hands use it, the trash can in the background will not be the only thing that disappears.

11:00 · source ↗

AI film at Tribeca points to fewer prompts and more custom production pipeline

The Verge describes the stronger AI work around Dear Upstairs Neighbors at Tribeca as custom workflows around Veo and Imagen, not simple prompting of a general model. For studios, the sober lesson is that value sits in style control, not in a magic prompt.

Hollywood is not only threatened by a kid with a prompt in a living room. The bigger shift comes when a producer opens the shooting plan and finds a new column beside the storyboard: model pipeline.

2026-06-10
20:00 · source ↗

OpenAI is using Oracle Cloud to solve procurement, not demos

OpenAI is offering its models and Codex to Oracle Cloud customers through existing cloud commitments. For enterprise teams, the interesting part is not the endpoint, but the way AI fits into contracts, governance and billing they already use.

The real trick is the receipt: once AI hides inside a familiar cloud bill, it gets into the room faster than a new vendor carrying its own contract.

15:00 · source ↗

Niteshift raises $7 million to make AI coding agents less sticky

Niteshift, founded by former Datadog engineers, raised a $7 million seed round led by Greylock and is selling infrastructure for AI coding agents. Its bet is not another autocomplete, but the ability to switch between GPT, Claude and open source models when the model provider becomes a competitor.

Niteshift is selling an emergency exit from a house where the model vendor also rents the rooms and changes the locks. If that exit only leads to another hallway with a startup logo, enterprise teams will notice fast.

2026-06-09
22:59 · source ↗

Claude Fable 5 turns safety into a question of access to the best model

Nathan Lambert reads the Claude Fable 5 release as a dispute over who gets to use a frontier model without routing and filters. The important layer is not only model capability, but the governance system that decides when the user is really talking to the strongest model.

Safety policy here acts as a doorman in front of the best model, occasionally deciding that you do not get into the main room.

21:35 · source ↗

Agent cost is no longer a footnote. It is an engineering expense

Simon Willison shows how he manually added pricing for Claude Fable 5 in AgentsView and immediately saw the cost of local coding agents by project. The small trick points to a bigger shift: AI coding is starting to look like infrastructure consumption, not an app subscription.

The interesting part of this TIL is not the custom price. It is the developer finally seeing a receipt next to the diff produced by the agent.

19:38 · source ↗

Voice agents break on bilingual calls before they break in polished demos

ServiceNow AI published an ASR benchmark for code-switched speech in enterprise scenarios and tested seven systems. The uncomfortable point is simple: in voice agents, transcription errors propagate through the whole workflow, so bilingual speech is not a minor UX detail.

The customer switches languages mid-sentence and the agent quietly sends the ticket to the wrong queue. The benchmark just named the failure that was hiding behind acceptable WER scores in monolingual evals.

18:57 · source ↗

Gemini 3.5 Live Translate moves voice translation a few seconds behind the speaker

Google announced Gemini 3.5 Live Translate for near real-time voice-to-voice translation across more than 70 languages. The practical question is not just translation quality, but latency, voice stability, Meet availability and who carries the risk when a live call is mistranslated.

Live Translate puts an invisible interpreter in the room, speaking a few seconds after you. Beautiful, until noise makes it grab the wrong voice, language or sentence that someone uses to make a decision.

14:10 · source ↗

Gemma 4 12B pushes multimodality onto the laptop

Google introduced Gemma 4 12B as a unified, encoder-free multimodal model designed for high performance directly on a laptop. The practical question is whether a 12B model can deliver enough quality for local or edge use without heavy cloud infrastructure.

Gemma 4 12B is trying to place a multimodal model on the user's lap. Now we find out whether it works there, or just hums like a small server under the monitor.

2026-06-08
23:58 · source ↗

Apple puts Siri back in play through Gemini, but the proof is still waitlisted

Apple announced Siri AI and new Apple Intelligence features at WWDC 2026, while extending Private Cloud Compute to Google Cloud with NVIDIA GPUs for demanding tasks. After last year's Apple Intelligence disappointment, this is less about the keynote and more about whether Siri can finally survive outside the demo.

Apple does not need another round of keynote applause. It needs the first tired commuter on a train to say something messy to Siri and get the right action instead of another apology.

01:30 · source ↗

OpenAI is packaging AGI as public infrastructure

OpenAI published a plan built around an automated AI researcher, faster economic growth and “personal AGI” for everyone. The important shift is not the promise itself, but the tone: OpenAI is talking less like a product leader and more like a future steward of public infrastructure.

OpenAI is asking for trust at the scale of public infrastructure. It will earn that when it demonstrates the ability to slow its own development even when doing so is commercially painful.

2026-06-07
23:56 · source ↗

datasette-agent-edit tackles the boring part of agents: safe text edits

Simon Willison released datasette-agent-edit 0.1a0, a base plugin for Datasette Agent with view, str_replace and insert tools. It is not a flashy AI demo. It is the layer that decides whether an agent can edit text without casually breaking the file.

This is the kind of release that looks small until an agent rewrites the wrong paragraph in production SQL. The real power of agents will not be the “do it” button. It will be the guard that catches its fingers in time.

2026-06-05
23:56 · source ↗

Lockdown Mode cuts the riskiest prompt injection escape route

OpenAI has started rolling out Lockdown Mode for eligible personal ChatGPT accounts and self-serve ChatGPT Business. It does not stop prompt injection itself, but it limits outbound network requests, which are the channel an attacker needs to exfiltrate sensitive data.

Lockdown Mode is a lock on the back door, not a magic safety spell. The model at the desk is still reading notes strangers slide under it.

2026-06-04
2026-06-03
13:15 · source ↗

GPT-Rosalind moves from benchmarks toward governed science

OpenAI updated GPT-Rosalind for life sciences and is offering it in research preview to selected organizations globally. The more important move is not the scorecard, but the attempt to connect a model, Codex and bioinformatics tools into an auditable workflow.

GPT-Rosalind is more than a biology model. It is a lab bench where a lawyer, a scientist and a security team will stand over the same notebook arguing about who gets to press Run.