← Library · foundations

Context window - how much hell fits in a prompt

Context window - how much hell fits in a prompt

A context window is how many tokens a model can see at once. A bigger window is not memory, truth or a guarantee of better answers. It is a larger, pricier workbench.

What it is

A context window is the maximum amount of tokens a model can see in one run: system instructions, conversation history, documents, tool results and the current request. If something does not fit, it must be shortened, dropped or retrieved another way.

A token is not a word. It is a piece of text. More tokens mean more material to process, but also more noise, cost and room for mistakes.

What larger context solves

Long context helps with large documents, repositories, legal files, meeting transcripts and long agent tasks. The model can keep more material in view and does not have to jump through retrieval as often.

But a larger window does not mean equal attention to everything. Information in the middle can disappear. Relevant details can drown in filler. And if you put garbage into the prompt, you get elegantly processed garbage.

Context versus memory versus RAG

Context is what the model sees right now. Memory is the mechanism that decides what the system carries across runs. RAG is a way to find relevant pieces of data and add them to context.

Long context can sometimes replace RAG. More often it complements it: RAG selects the right pieces, long context keeps them together.

Common mistakes

  • Believing more tokens automatically mean a better answer.
  • Filling context with the whole archive without relevance filtering.
  • Confusing long context with long-term memory.
  • Ignoring cost. Big windows cost latency, money and sometimes quality.
  • Skipping evals. Without measurement you do not know whether long context helped or merely made the invoice feel sophisticated.

What to remember

A context window is the model's workbench. A larger desk is useful, but it will not sort the papers for you. Good systems combine relevance selection, compression, RAG, memory and evals.