Simon Hørup Eskildsen from Turbopuffer dropped something important in a recent Latent Space conversation. It's one of those technical insights that sounds narrow but actually explains why so many AI applications feel sluggish right now.
The shift from RAG (Retrieval-Augmented Generation) to agentic systems isn't just a new pattern for developers. It fundamentally changes what databases need to handle.
The Old Model vs The New Reality
Traditional RAG systems worked like this: a user asks a question, the system makes one thoughtful database query, retrieves relevant context, generates a response. One user, one query, sequential processing. Database infrastructure was built for exactly this pattern.
Agentic systems behave completely differently. As Simon explains, agents spawn multiple concurrent sub-tasks. One user request might trigger dozens of parallel database queries as the agent simultaneously investigates different angles, retrieves different context, explores different solution paths.
That's not a 10x increase in database load. It's more like 100x. And it's happening right now as developers ship more sophisticated AI applications.
Why Existing Infrastructure Struggles
The databases we've been using weren't designed for this workload pattern. They optimised for different trade-offs - consistency over speed, single large queries over massive parallelism, predictable access patterns over chaotic concurrent requests.
Simon's insight is that search infrastructure needs rethinking from first principles when agents are the primary users. Not humans making occasional searches. Not even traditional applications making predictable queries. But AI agents making hundreds of simultaneous requests with unpredictable patterns.
Turbopuffer is his answer to that problem - a database built specifically for the workloads agents create. The architecture decisions differ fundamentally from both traditional databases and even modern vector databases designed for RAG.
The Pricing Problem
Here's where it gets interesting for anyone building AI applications. When your database load increases 100x because you added agentic capabilities, your infrastructure costs could explode. That's not theoretical - developers are hitting this right now.
Simon discusses how Turbopuffer approaches pricing for this new reality. It's not just about per-query costs. It's about predictable economics when query patterns are inherently unpredictable. That's a hard problem - probably harder than the technical architecture challenges.
For builders, this matters because infrastructure costs directly determine what you can afford to build. If agentic applications cost 100x more to run than RAG systems, many use cases simply won't be viable. The economics need to work, or the applications don't get built.
Architecture Decisions That Actually Matter
The conversation goes deep on technical choices - hybrid search strategies, how to handle massive concurrency, where to make trade-offs between latency and cost. This isn't abstract theory. These are the decisions that determine whether your AI application responds in 200 milliseconds or 2 seconds.
Here's what I noticed is how specific the requirements are. Building for agentic workloads isn't just "make it faster" or "handle more queries". It requires fundamentally different assumptions about access patterns, consistency requirements, and failure modes.
Simon's point about database design following application patterns is crucial. We're not just adding agents to existing architectures. We're discovering that agents require new infrastructure primitives.
What This Means for Developers
If you're building AI applications, particularly anything with agents, this conversation is worth your time. Not because Turbopuffer is the only answer - it's one approach among several emerging solutions. But because the problems Simon articulates are problems every developer building with agents will encounter.
The shift from RAG to agents isn't just about prompt engineering or agent frameworks. It's about infrastructure that can handle what agents actually do when they run. That's the unglamorous foundation work that determines whether ambitious AI applications actually ship or just stay as impressive demos.
We're still early in figuring this out. But conversations like this one - deep technical discussions about real bottlenecks and actual solutions - are how we get from "agents are cool" to "agents are useful in production".
The infrastructure is adapting. Slowly, practically, with real trade-offs. That's how things actually get built.