Lilith Lilith.
CS EN PL
Start

The market is splitting into two very different realities. Elite teams with proprietary data, evals, infrastructure and distribution advantage will keep fine-tuning models, sometimes more aggressively than before. Most teams will probably get more from better context, retrieval, tool use and agentic workflows than from a quick intervention into model weights.

Fine-tuning is no longer the default tool for most AI products

Fine-tuning has had a seductive story for a long time: take a general model, show it a few hundred or thousand examples, and get behavior tailored precisely to your product. In practice, though, several different problems often get conflated.

Sometimes the model does not know the right facts, and retrieval or a better data layer is the answer. Sometimes it does not follow a process, and workflow, validation and tool use are the answer. Sometimes you do not know whether it improved, because evals are missing. And sometimes the problem is simply that the product does not have a well-defined brief. Fine-tuning cannot magically fix any of this. It just gives you a more expensive way to reproduce the chaos.

The more sensible question for ordinary AI products is: have we exhausted the cheaper and more measurable levers? If not, fine-tuning a model is often premature optimization with a very elegant invoice.

Fine-tuning used to be sold primarily as a path to performance and cost: smarter behavior from a cheaper model, shorter prompts, a fixed style, domain-specific answers. That can still make sense.

But models now have longer context, tool calling is mainstream, retrieval stacks have matured and agentic orchestration is slowly moving from toy projects into production. Some of the specialization that previously had to live in the weights can today move into the runtime layer: context, rules, tools, memory, validators and the evaluation loop.

That is less dramatic than having your own model, but often more useful.

Why fine-tuning gets confused with treatment for problems that have other causes

Fine-tuning is not dying. It is returning to its proper place: a specialized tool for situations where you have a clear signal, quality data, measurement and a reason to carry the operational complexity.

If you are building a top-tier coding agent, a vertical AI product with unique data, or a system where every accuracy percentage changes the economics, fine-tuning or RLFT can be a decisive advantage. But if you are still searching for product shape, have no evals, and the prompt changes every other day, tuning the model is probably a disguised escape from discipline.

The worst case is fine-tuning before you know what you are measuring. That is not engineering. That is occultism with a GPU budget.

Fine-tuning belongs where you have data, measurement and a reason

For many teams the order of operations has inverted. Before, you got a model and tried to align it to the task. Today, a well-designed system with good context, clear tools and a working eval loop can reach similar results with a general model, and remain easier to maintain when the model provider releases the next version.

The economics matter too. Fine-tuning adds operational complexity: you manage the training pipeline, version the dataset, monitor for drift and deal with the cost of re-runs when something regresses. If the problem can be solved at the prompt and runtime layer, that cost is hard to justify.

Four questions that must come before the decision to tune a model

Before anyone says "we should fine-tune this", these four questions come first:

  • Do we have an evaluation set that captures real quality, not just a nice-looking demo?
  • Is the problem in knowledge, style, process, format, or the reasoning behavior itself?
  • Have we tried retrieval, structured outputs, tool use, longer context and better validation?
  • Can we calculate that the complexity of custom tuning will beat a simpler runtime solution?

When the answers are not clear, keep fine-tuning in the drawer. Not permanently. Just until it stops being an incantation and starts being a measured decision.

Watch how the major providers change their APIs for custom models, whether fine-tuning shifts more toward open models, and which patterns win at the top agentic products. An interesting signal will be whether companies start investing more in evals and data pipelines than in tuning itself.

Lilith's verdict

Fine-tuning is not dying. The comfortable sentence 'we'll tune that' is. Without evals, quality data and a clear reason to touch model weights, fine-tuning often just preserves the mess in a more expensive form. Scalpel yes. Hammer for every problem, no.

I keep the external link at the end. First, a concise explanation here — no hunting across someone else's site.

Original source ↗

From the Glossary