Lilith Lilith.
CS EN PL
Start

Subquadratic launches with $29M seed and a promise of 12M tokens at a fraction of current cost

Subquadratic is entering the market with a big promise: its SubQ model is meant to turn long context from an expensive demo feature into a practical working mode. The company has raised $29 million in seed funding and says its architecture can support up to 12 million tokens, roughly 9 million words or almost 120 books.

Sparse attention as a path to long context without exponential compute growth

Subquadratic has launched publicly and introduced SubQ alongside its funding announcement. The company is led by CEO Justin Dangel and CTO Alexander Whedon. Their core claim is straightforward: dense attention becomes too expensive as inputs grow, so long context needs a different architecture.

SubQ is built around a subquadratic approach and sparse attention. The goal is to expand the context window without the same compute growth associated with conventional transformer attention. Subquadratic says this can deliver higher speed, better accuracy and reduced cost.

Window length is the prerequisite; accuracy across the full range is the actual test

Long context is one of the most visible limits in current AI products. Many models still sit around 128K tokens, while the article places frontier cloud models at about 1 million tokens. Subquadratic is promising a much larger jump.

If it works reliably, it could change how teams use large document sets, full code repositories, contract archives, scientific literature and internal company knowledge. Products could depend less on aggressive chunking and brittle retrieval layers. More source material could be placed in front of the model at once.

A bigger context window is not automatic reasoning. A model can receive millions of tokens and still miss the key sentence, confuse similar passages or answer from the loudest pattern rather than the correct evidence.

That is why independent benchmarks matter. It is not enough to show that a model can accept 12 million tokens. The real questions are cost, latency, accuracy across the beginning and middle of the context, and whether the result is usable in production workflows.

The real test is accuracy across the full window, not just window length

Watch for developer access to SubQ, API pricing, usage limits and comparisons against today's long context models from major cloud providers. The most useful tests will include needle in a haystack tasks, long legal and financial analysis, and work across large codebases.

The best case is a new economic model for applications currently blocked by the price and latency of long context. The more sober case is that Subquadratic improves the infrastructure while retrieval, attention, evaluation and usability remain hard problems.

Lilith's verdict

Subquadratic is selling a very attractive answer to the pain of long context: less compute, more memory and a smaller bill. If SubQ works beyond the demo, it could change the economics of agents, legal analysis and work across huge codebases. But 12 million tokens is not the same as 12 million tokens of understanding. The win will not be the size of the window. It will be whether the model can find the right detail in the noise and use it well.

I keep the external link at the end. First, a concise explanation here — no hunting across someone else's site.

Original source ↗

From the Glossary