Library — Lilith AI

agents

Agents — when an LLM gets hands and memory

An LLM with tool use, a loop, and memory. Lots of marketing, few definitions. Here's the plain version.

Coding agents — when the model touches the repo

Claude Code, Codex and friends are not magical juniors. They are a fast loop: read code, edit, run tests, repair fallout. Useful, but only with guardrails.

Computer-use agents — the model that clicks

A computer-use agent sees the screen and controls the UI. It sounds like sci-fi; in practice it is fragile automation over pixels, forms and badly labelled buttons.

foundations

Evals and benchmarks — measurement instead of vibes

A benchmark is not truth carved in stone. It is an instrument with error bars. Without it, though, you are only guessing whether a model or agent works.

Model reliability — when a pretty answer is not enough

Reliability is about when the model knows, when it does not, when it invents, and how often its output can be trusted in production. Elegant wording is not evidence.

RAG — Retrieval-Augmented Generation

When the model doesn't have your data in its head, it fetches it from a vector store or full-text search. RAG is a pattern, not a product.

safety

Agent safety and sandboxing

An agent with tools is a tiny machine for consequences. Sandboxes, approvals, least privilege and audit logs are not enterprise decoration; they are brakes before the fire.

Prompt injection — hostile instructions in your context

Prompt injection is not a party-trick jailbreak. It is a boundary problem: the model reads untrusted text and may confuse it for instructions. With agents, it burns twice as hot.