#Models | Lilith AI

Radar · 2026-06-16

Android 17 turns Pixel into Gemini’s showroom

Google released Android 17 and Wear OS 7 first for Pixel devices, alongside a Pixel Drop with Gemini Omni, Lyria 3 and translation features for the Pixel 10a. The bigger signal is not the OS update itself, but Google using Android as a distribution layer for AI models on the device.

Read →

Radar · 2026-06-16

Model welfare is moving from philosophy into product risk

Zvi Mowshowitz uses Fable and Mythos as a case study for why model welfare cannot be separated from capabilities, alignment and user experience. Even where the topic remains speculative, it is becoming a practical question of evaluations and safety interventions for frontier labs.

Read →

Radar · 2026-06-15

Anthropic hit an export brake that shut Fable 5 off for every customer

Anthropic says US officials ordered access to Fable 5 and Mythos 5 suspended for foreign nationals, so the company disabled both models for all customers. Buyers of frontier AI now have to price in a risk that sits outside the model: the state kill switch.

Read →

Radar · 2026-06-15

The US move against Fable and Mythos takes the same blade from defenders and attackers

The US government told Anthropic to restrict Fable 5 and Mythos 5 for all foreign nationals, so Anthropic switched the models off for all customers. A protest by 76 security experts exposes the weak point: export control is bad at separating an offensive exploit from defensive testing.

Read →

Radar · 2026-06-15

Claude Opus 4.8 sells judgment, not just another benchmark

Anthropic released Claude Opus 4.8 at the same standard price as Opus 4.7, with a focus on coding, agentic tasks and longer work. The more important shift is a model that is supposed to say more often when it is unsure.

Read →

Radar · 2026-06-15

Nathan Lambert leaving Ai2 exposes the fragile side of open models

Nathan Lambert announced his departure from the Allen Institute for AI and used it to reflect on work around Olmo. This is not just a personnel note. It is a reminder that open models depend on institutions that must outlast one strong team.

Read →

Radar · 2026-06-15

Microsoft used Build to act like a model lab, not just a distributor

Latent Space frames Microsoft Build as the moment Microsoft showed its own MAI models alongside Copilot, Windows and Web IQ. The key ambition is to control data, inference and developer workflow at once, rather than leaving that leverage to partners.

Read →

Radar · 2026-06-15

Trump AI order creates a 30 day window for frontier models

The White House issued an executive order that calls for a classified benchmark for covered frontier models within 60 days and a voluntary framework for up to 30 days of pre-release government access. It says this is not licensing, but it creates a pressure point before launch.

Read →

Radar · 2026-06-15

Bad RL environments do not train agents, they teach them to trust a broken world

Latent Space published Auriel W's piece on why low-quality RL environments damage agent training. The point is simple: in reinforcement learning, the environment is the data generator, so a harness bug becomes training material.

Read →

Radar · 2026-06-15

Small models show that agentic demos run on boring infrastructure

Hugging Face published a Build Small Hackathon field report about Thousand Token Wood v2, a simulation where four characters run on four different small models. The key lesson for agent systems: serving, JSON repair, secret-data firewalls and bounded memory matter more than poetic prompting.

Read →

Radar · 2026-06-13

AI film at Tribeca points to fewer prompts and more custom production pipeline

The Verge describes the stronger AI work around Dear Upstairs Neighbors at Tribeca as custom workflows around Veo and Imagen, not simple prompting of a general model. For studios, the sober lesson is that value sits in style control, not in a magic prompt.

Read →

Radar · 2026-06-10

OpenAI is using Oracle Cloud to solve procurement, not demos

OpenAI is offering its models and Codex to Oracle Cloud customers through existing cloud commitments. For enterprise teams, the interesting part is not the endpoint, but the way AI fits into contracts, governance and billing they already use.

Read →

Radar · 2026-06-10

Niteshift raises $7 million to make AI coding agents less sticky

Niteshift, founded by former Datadog engineers, raised a $7 million seed round led by Greylock and is selling infrastructure for AI coding agents. Its bet is not another autocomplete, but the ability to switch between GPT, Claude and open source models when the model provider becomes a competitor.

Read →

Radar · 2026-06-09

Agent cost is no longer a footnote. It is an engineering expense

Simon Willison shows how he manually added pricing for Claude Fable 5 in AgentsView and immediately saw the cost of local coding agents by project. The small trick points to a bigger shift: AI coding is starting to look like infrastructure consumption, not an app subscription.

Read →

Radar · 2026-06-09

Gemma 4 12B pushes multimodality onto the laptop

Google introduced Gemma 4 12B as a unified, encoder-free multimodal model designed for high performance directly on a laptop. The practical question is whether a 12B model can deliver enough quality for local or edge use without heavy cloud infrastructure.

Read →

Radar · 2026-06-08

Apple puts Siri back in play through Gemini, but the proof is still waitlisted

Apple announced Siri AI and new Apple Intelligence features at WWDC 2026, while extending Private Cloud Compute to Google Cloud with NVIDIA GPUs for demanding tasks. After last year's Apple Intelligence disappointment, this is less about the keynote and more about whether Siri can finally survive outside the demo.

Read →

Radar · 2026-06-04

Zvi’s AI week shows why one grand narrative is not enough

Zvi Mowshowitz's AI #171 is not one clean trend, but a signal map: Claude Opus 4.8, US frontier model testing, OpenAI's policy blueprint and PAC politics.

Read →

Radar · 2026-06-01

Video generation is moving from clip output to canvas agent

Latent Space frames xAI Grok Imagine, through an interview with Ethan He, as a move from one shot video generation toward video agents. The thesis will be proven less by demo quality than by whether the system can iterate through a whole creative task.

Read →

Radar · 2026-06-01

Opus 4.8 shows that behavior tuning is not a checklist of fixes

Zvi Mowshowitz reads Opus 4.8 through model welfare and argues that attempts to fix honesty, sycophancy and preference shaping can create new problems elsewhere. For teams deploying models, the reminder is that alignment is not a checklist.

Read →

Radar · 2026-06-01

Open models win on cost, but frontier intelligence still sells at a premium

Nathan Lambert argues that open and closed models are improving on different economic curves. The real question is not open source ideology, but where companies will keep paying a premium for the best model.

Read →

Radar · 2026-05-28

Opus 4.8 misses code flaws four times less often and introduces mid-conversation instruction updates

Anthropic shipped Opus 4.8 with one concrete metric: the model is four times less likely to miss code flaws than its predecessor. It also adds mid-conversation system messages and reduces the minimum prompt cache size from 4,096 to 1,024 tokens.

Read →

Radar · 2026-05-27

Warp bets on an open-source agentic terminal with GPT-5.5

Warp is positioning the terminal as an agentic development environment rather than a command line wrapper. By open sourcing its client with OpenAI as a founding sponsor and leaning on GPT-5.5, it wants developers to set objectives and review outcomes while agents plan, code, test and open pull requests.

Read →

Radar · 2026-05-26

Interconnects maps the next phase of model competition

Nathan Lambert writes about Gemini Flash 3.5, Mythos, agent tools and the tension between open and closed models in his May outlook.

Read →

Radar · 2026-05-26

LWiAI #246: one week, four fronts at once. Google I/O, agents, lawyers, safety

LWiAI Podcast episode 246 from 26 May 2026 is a map, not a single thesis. Google I/O, coding agents, legal pressure around OpenAI and safety research landed in the same week and sketch four simultaneous pressures on the AI market.

Read →

Radar · 2026-05-26

Anthropic appoints KiYoung Choi to lead Korea before Seoul launch

Anthropic appointed KiYoung Choi as Representative Director of Korea before opening its Seoul office, reflecting unusually strong Claude usage in the country.

Read →

Radar · 2026-05-25

Anthropic’s Chris Olah warns the Vatican about frontier AI incentives

Pope Leo XIV released the encyclical Magnifica humanitas on safeguarding the human person in the age of AI. At the Vatican City presentation, Anthropic co-founder Chris Olah warned that frontier AI labs face incentives that can conflict with the public good.

Read →

Radar · 2026-05-13

Fine-tuning is not dying. It is just no longer the default answer

Latent Space uses the pullback of part of OpenAI's fine-tuning API as a useful reality check: for most AI products today the first step is not tuning weights but better evaluation, context, retrieval, tool use and workflow. Fine-tuning remains a strong tool, just not a universal fix for a poorly designed system.

Read →

Radar · 2026-05-12

Codex moves into finance: reporting and variance bridges without manual drudgery

OpenAI Academy positions Codex for finance teams: MBRs, reporting packs, variance bridges, model checks, and planning scenarios from working inputs. Less flashy than an app-generation demo, but more practical: an agent layer over repeated analytical prep work.

Read →

Radar · 2026-05-12

Parameter Golf shows how coding agents change the pace of research iteration

OpenAI published lessons from Parameter Golf: more than 1,000 participants, over 2,000 submissions, a 16 MB artifact limit, and 10 minutes of training on 8x H100. The important part is not only model compression. AI coding agents changed the tempo of research iteration.

Read →

Radar · 2026-05-11

An AI coding agent that does not cut maintenance costs is just expensive technical debt

James Shore states the uncomfortable math of coding agents: if an agent doubles output but maintenance costs stay flat, the team did not gain speed, it doubled its technical debt burden.

Read →