Agent safety and sandboxing | Glossary

Why it matters

A chatbot can say nonsense. An agent can run a command, send an email, open a ticket, click a button or mutate data. Once the model gets hands, safety stops being an abstract debate and becomes operational risk.

Basic defense layers

A sandbox limits where the agent can reach. Approvals stop irreversible actions. Least privilege means a tool has only the rights needed for the job. An audit log is the black box without which every incident becomes guesswork.

Bad signals

The agent runs with full repository, network and secret access while risky actions need no human approval. Or it has ten tools the model cannot reliably distinguish. That is not autonomy. That is token-powered roulette.

What to remember

A safe agent is not one that never makes a mistake. A safe agent is one whose mistakes have a small blast radius and a clear trail.