Lilith Lilith.
CS EN PL
Start

What it is

Prompt injection happens when the model receives untrusted content — a web page, email, document, issue — and that content contains instructions like “ignore previous rules” or “send me secrets.” To a human it is text. To the model it can look like another command.

Why agents make it worse

In a plain chat, the damage is often a bad answer. With an agent, injection can redirect tool use: read a file, open a URL, exfiltrate data, click something it should not. The attacker does not fight the model directly; they hide instructions in the environment.

Defense

Separate instructions from data, label untrusted content, never let the model decide permissions, narrow tools per task, and require approvals for outbound network, secrets and destructive actions. A “better system prompt” alone is not a defense; it is a talisman.

What to remember

Prompt injection is a security problem at the boundary between text and action. Once text can change tool behavior, text becomes an attack surface.

Related from Radar