← Library · foundations
Model reliability — when a pretty answer is not enough
Reliability is about when the model knows, when it does not, when it invents, and how often its output can be trusted in production. Elegant wording is not evidence.
What reliability means
It is not just accuracy. A reliable model is calibrated, can admit uncertainty, does not swing wildly on tiny prompt changes, and does not hide risk behind a confident tone. In production, consistent behavior matters more than one wow screenshot.
Typical failures
Hallucinations, fake citations, bad abstention, prompt sensitivity, inconsistent answers to the same task and safety regressions. A model can be powerful and still unreliable in a specific domain. Annoying, yes. Reality does not care.
How to improve it
RAG with citations, evals on your own data, output constraints, critic models, human review and fallbacks all help. But the core is not measuring only “it answered.” Measure whether it answered correctly, when it abstained and how much an error costs.
What to remember
Reliability is not a general property of a model. It is a property of a model inside a specific workflow, with specific data and specific risk.