#research | Lilith AI

⌕

CS EN PL

Start

From Radar

Radar · 2026-06-15

Thirteen words on Reddit can poison an AI answer

Research described by 404 Media says a 13 word snippet of retrieved text from sites such as Reddit, Wikipedia, Quora or Facebook can push AI agents toward spam or scam output. For AI search, that turns SEO into a prompt injection and user-generated content moderation problem.

Radar · 2026-06-15

Google gives enterprise RAG a guard who knows when not to answer

Google introduced an agentic RAG system for Gemini Enterprise Agent Platform that checks whether it has enough context before answering. For companies, that brake matters more than another polished retrieval layer.

Radar · 2026-06-15

Raschka's LLM paper list shows research splitting into production layers

Sebastian Raschka published a curated list of LLM papers from January to May 2026. It is a useful filter for teams trying to separate the research feed from topics that matter for architecture, agents and inference.

Radar · 2026-06-09

Gemini 3.5 Live Translate moves translation from captions into live voice

Google is launching Gemini 3.5 Live Translate for near real-time speech-to-speech translation across more than 70 languages. Users will see convenience first, but companies will care about latency, audit and trust in a voice speaking for someone else.

Radar · 2026-06-09

Gemma 4 12B pushes multimodality onto the laptop

Google introduced Gemma 4 12B as a unified, encoder-free multimodal model designed for high performance directly on a laptop. The practical question is whether a 12B model can deliver enough quality for local or edge use without heavy cloud infrastructure.

Radar · 2026-06-03

GPT-Rosalind moves from benchmarks toward governed science

OpenAI updated GPT-Rosalind for life sciences and is offering it in research preview to selected organizations globally. The more important move is not the scorecard, but the attempt to connect a model, Codex and bioinformatics tools into an auditable workflow.

Radar · 2026-06-01

Search should not be a button. It should be programmable infrastructure for agents

Perplexity describes Search as Code: an architecture where an agent does not call one monolithic search engine, but assembles a retrieval pipeline as code. The point is not a nicer search API. It is control over how evidence is found, filtered and verified.

Radar · 2026-05-30

A service worker intercepts HTTP requests and handles them in a Python ASGI app running entirely in the browser

Simon Willison experiments with running Python ASGI apps directly in the browser using Pyodide and a service worker. FastAPI and a complete Datasette 1.0a31 both ran successfully. The point is distribution: demos or data tools as self-contained web pages without a server.

Radar · 2026-05-28

Google wants agents to propose hypotheses and write experimental code instead of the scientist

At I/O 2026, Google Research showed Gemini for Science, ERA and Co-Scientist as systems where AI takes over research middle steps: literature review, writing code, iterating hypotheses. Risks of false certainty and vendor lock-in are substantial.

Radar · 2026-05-28

Data Formulator 0.7 tries to rebuild enterprise data analytics around AI agents

Microsoft Research released Data Formulator 0.7, an analytics workspace where AI agents assist with exploration, transformation and visualization of enterprise data. The key question is whether the agent handles messy, permissioned data outside the demo.

Radar · 2026-05-26

Anthropic appoints KiYoung Choi to lead Korea before Seoul launch

Anthropic appointed KiYoung Choi as Representative Director of Korea before opening its Seoul office, reflecting unusually strong Claude usage in the country.

Radar · 2026-05-25

Anthropic’s Chris Olah warns the Vatican about frontier AI incentives

Pope Leo XIV released the encyclical Magnifica humanitas on safeguarding the human person in the age of AI. At the Vatican City presentation, Anthropic co-founder Chris Olah warned that frontier AI labs face incentives that can conflict with the public good.

Radar · 2026-05-12

Parameter Golf shows how coding agents change the pace of research iteration

OpenAI published lessons from Parameter Golf: more than 1,000 participants, over 2,000 submissions, a 16 MB artifact limit, and 10 minutes of training on 8x H100. The important part is not only model compression. AI coding agents changed the tempo of research iteration.

Radar · 2026-05-06

AlphaEvolve finds algorithms in days that teams spent months on, with production numbers

DeepMind introduced AlphaEvolve as a Gemini-powered evolutionary loop that automatically discovers better algorithms. Concrete production results: 30 % fewer errors in genomics, 20 % lower write amplification for Spanner, Klarna doubled transformer training speed.

Radar · 2025-10-23

Gemini 2.5 Computer Use: DeepMind builds a dedicated model for agents that click instead of calling an API

Google DeepMind released Gemini 2.5 Computer Use in preview: a specialized model for agents that drive user interfaces. Unlike general-purpose Gemini 2.5 Pro, this model was trained specifically for screen interaction, not just reasoning about it.

From the Glossary

Glossary

AI-assisted research — the model as a research partner

AI-assisted research uses models to find hypotheses, write code, test variants and read literature. It is not automatic science. It is a faster research loop with new ways to fall on your face.

Glossary

Evals and benchmarks — measurement instead of vibes

A benchmark is not truth carved in stone. It is an instrument with error bars. Without it, though, you are only guessing whether a model or agent works.