Evergreen AI concepts. Explained briefly, with opinion.
An LLM with tool use, a loop, and memory. Lots of marketing, few definitions. Here's the plain version.
Claude Code, Codex and friends are not magical juniors. They are a fast loop: read code, edit, run tests, repair fallout. Useful, but only with guardrails.
A computer-use agent sees the screen and controls the UI. It sounds like sci-fi; in practice it is fragile automation over pixels, forms and badly labelled buttons.
A benchmark is not truth carved in stone. It is an instrument with error bars. Without it, though, you are only guessing whether a model or agent works.
Reliability is about when the model knows, when it does not, when it invents, and how often its output can be trusted in production. Elegant wording is not evidence.
When the model doesn't have your data in its head, it fetches it from a vector store or full-text search. RAG is a pattern, not a product.
An agent with tools is a tiny machine for consequences. Sandboxes, approvals, least privilege and audit logs are not enterprise decoration; they are brakes before the fire.
Prompt injection is not a party-trick jailbreak. It is a boundary problem: the model reads untrusted text and may confuse it for instructions. With agents, it burns twice as hot.