API Contracts Break Because Nobody Writes Down the Rules

Your API breaks in production not because the code is wrong. It breaks because the implicit rules you thought everyone understood were never written down.

A recent article on Dev.to makes a simple argument: most API contract failures aren't implementation bugs. They're agreements that lived in someone's head, got shipped without documentation, and broke the first time a consumer made a reasonable but undocumented assumption.

The fix isn't better testing. It's writing the contract properly in the first place.

The Silent Assumptions That Kill APIs

Here's a common failure mode. Your API returns an error. The consumer retries. The retry succeeds, but now the operation has happened twice. Was that meant to be idempotent? The spec doesn't say. The implementation assumed retry logic would handle it. The consumer assumed the endpoint was safe to retry. Both assumptions were reasonable. Both were wrong.

Or: your API starts returning a new error code. The consumer's error handling doesn't recognise it and treats it as a fatal failure. Should that new error code have been a breaking change? Was there a taxonomy of error types that indicated which ones were retryable? If there was, nobody wrote it down.

These aren't edge cases. They're the everyday reality of API contracts that specify request and response shapes but not the behaviour those shapes enable.

What Actually Needs to Be in the Spec

The article argues for three categories of rules that belong in specs but usually live in tribal knowledge:

Error taxonomy. Not just "here are the error codes" but "here's what each error means and how you should respond". Which errors are retryable? Which indicate client mistakes versus server failures? Which should trigger alerts versus silent fallbacks? If the consumer has to guess, you've left the contract incomplete.

Idempotency rules. Which operations are safe to retry? Which generate unique IDs on every call? Which require the client to send an idempotency key? This isn't optional information. It's the difference between a reliable system and one that randomly double-charges customers when a network blip causes a retry.

Breaking change definitions. What counts as breaking? Adding a field is usually safe. Removing one isn't. But what about changing field types? Adding required validation? Returning null where you used to return an empty array? If your versioning strategy depends on "we won't break things" but you haven't defined what breaking means, you're one deploy away from angry consumers.

Why Specs Fail to Cover This

OpenAPI and similar tools focus on structure. They're good at describing what a valid request looks like and what fields a response contains. They're not designed to capture "if you send this request twice, the second one is ignored" or "this error code means back off for 60 seconds".

That information ends up in README files, Slack conversations, or the heads of whoever built it. When that person leaves, the knowledge goes with them. The new developer reads the spec, thinks they understand the contract, and ships code that works until it doesn't.

For business owners running services built on third-party APIs, this is why integrations break in ways that feel like the other side changed something. Often they didn't. They just never told you the full set of rules, and you made a reasonable assumption that turned out to be wrong.

Fixing It Before It Ships

The solution isn't complicated. Before writing code, write the behavioural contract. Not as comments or documentation that will drift out of sync, but as part of the spec itself.

If your API design process starts with "what does the request look like", you're already too late. Start with "what promises are we making about how this behaves". Is it idempotent? What errors can it return and what should clients do with them? What changes would break existing consumers?

Write those rules down. Make them reviewable. Make them enforceable - some of this can be tested, some of it has to be policy, but all of it should be explicit.

The article's argument is that this prevents the majority of production API failures. Not all of them - you'll still have bugs, performance issues, unexpected edge cases. But the failures caused by "I didn't know that's how it worked" mostly disappear.

The Cost of Incomplete Contracts

Every unwritten rule is a production incident waiting to happen. Every implicit assumption is a support ticket when it turns out not everyone made the same assumption. Every "we'll document that later" is technical debt that compounds with every new consumer.

The time to fix it is before the first integration. Once you have consumers in production, changing the rules becomes a migration project. Writing them down upfront is an afternoon of careful thinking. Worth it.