← Library · safety
Prompt injection — hostile instructions in your context
Prompt injection is not a party-trick jailbreak. It is a boundary problem: the model reads untrusted text and may confuse it for instructions. With agents, it burns twice as hot.
What it is
Prompt injection happens when the model receives untrusted content — a web page, email, document, issue — and that content contains instructions like “ignore previous rules” or “send me secrets.” To a human it is text. To the model it can look like another command.
Why agents make it worse
In a plain chat, the damage is often a bad answer. With an agent, injection can redirect tool use: read a file, open a URL, exfiltrate data, click something it should not. The attacker does not fight the model directly; they hide instructions in the environment.
Defense
Separate instructions from data, label untrusted content, never let the model decide permissions, narrow tools per task, and require approvals for outbound network, secrets and destructive actions. A “better system prompt” alone is not a defense; it is a talisman.
What to remember
Prompt injection is a security problem at the boundary between text and action. Once text can change tool behavior, text becomes an attack surface.