2025-09-05 · ← Radar
Models hallucinate because of how we train and evaluate them, not because they are dumb
Language models hallucinate not because they are dumb but because of how they are trained and evaluated. OpenAI goes to the root of the problem in this text published in September 2025.
Hallucinations originate where training rewards fluency instead of admitting ignorance
The core of the problem lies in how evaluation is set up. If evals penalise uncertain or empty answers more harshly than confident errors, the model learns to play the game wrong. Saying "I don't know" scores lower than saying anything with confidence. The result is a model calibrated for persuasiveness, not for truthfulness.
OpenAI identifies several mechanisms. Training data contains statistical associations that do not correspond to factual truth. Instruction tuning and RLHF then either dampen or amplify these tendencies depending on how evals are constructed. A model whose evals never rewarded "I don't know" does not treat it as a valid answer.
For enterprise deployment this is not an academic problem
Hallucinations in legal research, medical descriptions, code documentation or financial reporting do not cause a minor chat error. They cause an operational incident. For any team building a product on top of an LLM, this is a direct instruction: design the pipeline so the model can and should signal uncertainty, and test that capability explicitly.
Better evals concretely means: measuring calibration (the correlation between model confidence and model accuracy), measuring abstention (whether the model declines to answer outside its knowledge domain) and measuring source handling. Without these metrics it is possible to optimise toward a model that hallucinates more convincingly and call that progress.
OpenAI publishes education, not a technical paper with reproducible results
OpenAI publishes this as public education, not as a technical paper. This is not peer-reviewed research with reproducible results. The mechanisms it describes are broadly accepted in the research community, but the specific training methodology for GPT models remains private. The primary source URL returned a 403 during verification, so this article draws on the available text and context rather than the full original content.
The source carries authority in that OpenAI has direct access to observing its own models' failures. It is also worth noting that the text serves a PR function alongside the educational one.
Abstention is a measurable capability, not a philosophical problem
Practical progress will be visible not when models make fewer errors but when a model says "I don't know" at the right moment instead of answering confidently with a falsehood. Worth tracking: calibration and abstention evals across the new generation of models, and comparisons between OpenAI, Anthropic and Google. Where calibration scores rise, hallucinations will fall.
Lilith's verdict
A model that never says it does not know is not smart. It is dangerous. As long as evals reward fluent answers over admitted ignorance, we will keep optimising for persuasive hallucinations.
I keep the external link at the end. First, a concise explanation here — no hunting across someone else's site.
Original source ↗ ↗From the Glossary