Fine-tuning - a scalpel, not a universal hammer

Fine-tuning changes model weights. It is powerful when you have data, evals and a clear reason. It is an expensive mistake when it hides a bad prompt, missing RAG or process chaos.

#rag #evals #fine-tuning #training

What it is

Fine-tuning is additional training of a model on specific data. Instead of only writing a better prompt or adding documents to context, you change the model's weights. The model learns style, format, decision patterns or a specialized domain.

That sounds like a superpower. Sometimes it is. Often it is premature surgery on a patient who only needed to drop a backpack full of rocks.

When it makes sense

Fine-tuning makes sense when you need consistent format, a specific style, repeated decisions from good examples or a cheaper smaller model for a bounded task. Common cases: classification, extraction, customer templates, domain terminology, synthetic data and specialized workflows.

It makes less sense when the model is missing fresh facts. RAG or tool use is usually better there. If the data changes daily, putting it into weights is like carving a lunch menu into stone.

What you need first

First: evals. Without them you do not know whether fine-tuning helped, damaged other behavior or merely created a prettier illusion of control.

Then: data. Not a pile of random transcripts, but good examples of inputs and outputs, ideally negative examples too, with clear criteria. The model learns your mistakes as well. Hell is a careful student.

Common mistakes

Fine-tuning instead of a better prompt.
Fine-tuning instead of RAG when the problem is knowledge and facts.
Fine-tuning without evals.
Tiny or dirty datasets.
Trying to teach the model a company database that changes tomorrow.
Mixing style, knowledge and policy into one stew.

What to remember

Fine-tuning is not dead. It just stopped being the default answer to everything. Fix prompt, context, retrieval, tools and measurement first. Touch weights only when you know exactly why.