#security | Lilith AI

⌕

CS EN PL

Start

From Radar

Radar · 2026-06-16

Android 17 turns Pixel into Gemini’s showroom

Google released Android 17 and Wear OS 7 first for Pixel devices, alongside a Pixel Drop with Gemini Omni, Lyria 3 and translation features for the Pixel 10a. The bigger signal is not the OS update itself, but Google using Android as a distribution layer for AI models on the device.

Radar · 2026-06-16

SearchLeak shows why prompt injection hurts more in enterprise AI than in chat

The SearchLeak vulnerability in Microsoft 365 Copilot Enterprise Search could let attackers steal emails, documents or 2FA codes after a user clicked a crafted link, according to Varonis and Ars Technica. Microsoft has patched it, but the lesson remains: an agent with access to corporate data is a security product, not just a productivity assistant.

Radar · 2026-06-15

Thirteen words on Reddit can poison an AI answer

Research described by 404 Media says a 13 word snippet of retrieved text from sites such as Reddit, Wikipedia, Quora or Facebook can push AI agents toward spam or scam output. For AI search, that turns SEO into a prompt injection and user-generated content moderation problem.

Radar · 2026-06-14

The Mythos suspicion turns export control into an access control problem

The Verge, citing Semafor, says the White House restricted exports of Anthropic Mythos partly over suspicions that a China linked group had access to it. For AI labs, the warning is blunt: frontier model security is not just about public APIs, but every path to access.

Radar · 2026-06-10

OpenAI is using Oracle Cloud to solve procurement, not demos

OpenAI is offering its models and Codex to Oracle Cloud customers through existing cloud commitments. For enterprise teams, the interesting part is not the endpoint, but the way AI fits into contracts, governance and billing they already use.

Radar · 2026-06-09

Gemini 3.5 Live Translate moves voice translation a few seconds behind the speaker

Google announced Gemini 3.5 Live Translate for near real-time voice-to-voice translation across more than 70 languages. The practical question is not just translation quality, but latency, voice stability, Meet availability and who carries the risk when a live call is mistranslated.

Radar · 2026-05-07

Mozilla fixed hundreds of Firefox bugs with Claude Mythos. AI security report quality just shifted.

Simon Willison described how Mozilla used early access to Claude Mythos Preview to systematically find and fix Firefox vulnerabilities. In April 2026 the number of fixed security bugs jumped to 423, compared to the usual 20 to 30 per month. The key shift: AI security reports stopped being noise and started being usable input.

Radar · 2026-04-28

OpenAI layers ChatGPT safety from model to abuse detection, but the numbers are missing

OpenAI outlines its layered approach to ChatGPT community safety: model safeguards, abuse detection, policy enforcement, and collaboration with external safety experts.

Radar · 2026-04-23

OpenAI pays up to $25,000 for bio jailbreaks in GPT-5.5, but proof will be in aggregate results

OpenAI launches a bio bug bounty targeting universal jailbreaks in GPT-5.5, with rewards up to $25,000 for critical biological safety findings.

Radar · 2025-12-18

GPT-5.2-Codex targets long-horizon refactors, proof will be independent production tests

GPT-5.2-Codex targets long-horizon coding tasks across large context: large-scale code transformations, security fixes, and multi-file consistency.

Radar · 2025-11-19

GPT-5.1-Codex-Max system card is worth reading, but trust it in proportion to its limits specificity

The GPT-5.1-Codex-Max system card describes two safety layers: model-level safety training and prompt injection protection, and product-level sandboxing with configurable network access.

Radar · 2025-11-02

Two new prompt injection papers: Rule of Two reveals structural risk, attacker adapts to defenses

Simon Willison highlighted two new papers on agent prompt injection. Meta's Rule of Two states that a system is safe only when it has at most two of three properties simultaneously: accepting untrusted input, accessing sensitive data, and changing state or communicating externally. A second paper from researchers at OpenAI, Anthropic, and DeepMind showed that 12 published defenses were bypassed by adaptive attacks with over 90 % success rate.

Radar · 2025-10-29

OpenAI opens policy-based content classification with open-weight safeguard models

OpenAI released gpt-oss-safeguard-120b and 20b: open-weight reasoning models where content classification policy is not baked into the weights but supplied at runtime. Organizations bring their own rules; the model reasons over them.

Radar · 2025-09-05

Models hallucinate because of how we train and evaluate them, not because they are dumb

OpenAI's September 2025 post goes to the root of hallucinations: models learn to play the evaluation game, not to answer truthfully. If evals penalise admitted uncertainty more harshly than confident errors, models calibrate toward persuasiveness.

Radar · 2025-08-27

OpenAI and Anthropic tested each other's models. The findings are instructive, the methodology still open.

OpenAI and Anthropic published results of a joint safety evaluation: they tested each other's models for misalignment, instruction following, hallucinations, and jailbreaking. For the first time, two leading labs show where outside eyes find their blind spots.

From the Glossary

Glossary

Agent safety and sandboxing

An agent with tools is a tiny machine for consequences. Sandboxes, approvals, least privilege and audit logs are not enterprise decoration; they are brakes before the fire.

Glossary

Prompt injection — hostile instructions in your context

Prompt injection is not a party-trick jailbreak. It is a boundary problem: the model reads untrusted text and may confuse it for instructions. With agents, it burns twice as hot.

Glossary

Model reliability — when a pretty answer is not enough

Reliability is about when the model knows, when it does not, when it invents, and how often its output can be trusted in production. Elegant wording is not evidence.