Railway Rebuilt the Cloud for Agents - Here's Why That Actually Matters

Railway runs bare-metal data centres because the economics of agent workloads don't work on traditional cloud infrastructure. Their customers are spending over $200,000 monthly on agent coding tasks alone. That's not hype - that's measured usage from 3 million users running production workloads.

Jake Cooper, Railway's founder, walked through their infrastructure thesis with Latent Space. The core argument: agents need different primitives than web applications. The cloud stack we built for serving HTTP requests breaks down when the workload is autonomous code execution.

The Agent Infrastructure Problem

Traditional cloud platforms charge for compute time. That pricing model assumes short-lived requests - a user hits an endpoint, you run some code, you return a response. Measure in milliseconds, bill in seconds, optimise for throughput.

Agent workloads invert that model. An agent coding task might run for minutes or hours. It's not a burst - it's sustained compute doing exploratory work. It forks execution paths, tests multiple approaches, retries failures. The cost structure of AWS or GCP makes this prohibitively expensive.

Railway's solution: own the hardware. They're running bare-metal servers in their own data centres with a three-month payback period on hardware costs. That economics only works at scale, and they're at scale - 3 million users, meaningful revenue, proven demand.

Safe Production Forking

Cooper introduced a concept called safe production forking - the ability to duplicate a running production environment, test changes in isolation, then merge or discard. For web applications, this is nice-to-have. For agent development, it's essential.

Agents break things. Not occasionally - constantly. They're exploratory by design. An agent writing code will try approaches that crash, configurations that fail, dependencies that conflict. Traditional deployment flows assume you test locally, then ship to production. Agents need to test IN production - against real data, real services, real constraints - without risking the live system.

Railway built forking as a first-class primitive. Spin up an identical copy of production, point the agent at it, let it try things. If it works, merge. If it breaks, discard. The cost is compute time, which Railway controls. The value is developers shipping agent features without fear.

Thirty-Five People, Three Million Users

Railway operates with a 35-person team. For context: that's the size of a mid-stage startup, not a cloud infrastructure provider serving millions. The efficiency comes from focused scope - they're not building everything AWS builds. They're building exactly what agent workloads need and nothing else.

This is the infrastructure thesis in practice. Start with a clear opinion about what the workload looks like, then build backwards from there. Don't abstract for every possible use case. Optimise for the one case that matters - in Railway's case, developers building with agents.

The $200,000+ monthly spend on agent coding tells you the use case is real. Developers are paying for this infrastructure because it solves a problem they can't solve elsewhere. That's product-market fit expressed in revenue, not surveys.

Rebuilding from First Principles

Cooper's broader point: the cloud infrastructure we have was designed for the applications we had in 2010. Stateless web services, microservices, containers, orchestration - all optimised for request-response patterns and human-driven deployments.

Agents don't fit that model. They're stateful, long-running, exploratory, and autonomous. They need different primitives: safe forking, persistent state, cheap sustained compute, infrastructure that adapts to exploratory workloads.

Railway isn't trying to compete with AWS on breadth. They're competing on depth for a specific workload. If agent development becomes as common as web development - and the usage numbers suggest it might - that focused strategy could matter more than general-purpose infrastructure.

What This Means for Builders

If you're building with agents, the infrastructure question matters more than it did for traditional applications. Running agent workloads on standard cloud platforms costs too much and provides the wrong primitives. You need forking, you need cheap sustained compute, you need infrastructure that treats exploratory execution as normal.

Railway's existence proves there's demand for agent-native infrastructure. Their growth proves developers will pay for it. The three-month hardware payback proves the economics work at scale.

For business owners considering agent implementations: infrastructure cost is a real constraint, not a theoretical one. The traditional cloud vendors will adapt eventually - they always do. But right now, platforms built specifically for agent workloads have a genuine advantage in both cost and capability.