A developer noticed something odd. When you ask ChatGPT a question with search enabled, the answer feels confident and comprehensive. But what is it actually searching for behind the scenes?
The answer, it turns out, is surprisingly different from what you typed. This developer built a Chrome extension to intercept the Server-Sent Events (SSE) stream and reveal the hidden queries. The data is fascinating.
The Reformulation Gap
ChatGPT does not search for what you asked. It reformulates your question - an average of 8.2 times per question - into what it thinks will return better results.
You ask: "What are the latest developments in quantum computing?"
ChatGPT searches: "quantum computing breakthroughs 2024", "recent quantum computing announcements", "quantum computing research papers January 2025", and five more variations.
The gap between what you ask and what gets searched is what the developer calls the Reformulation Gap. Across the dataset, it averaged 47 per cent. Nearly half the time, the search query bore only loose resemblance to the original question.
That is not necessarily bad. ChatGPT is optimising for results, not literalism. But it does mean you are not in control of the search. You are outsourcing query formulation to a model that may or may not share your intent.
The Consult-to-Cite Ratio
Here is where it gets messier. The extension tracked how many sources ChatGPT consulted versus how many it actually cited in the answer. The ratio was 3.2:1.
For every source ChatGPT references in its response, it consulted three others and chose not to mention them. You are getting a curated view, filtered through the model's judgement of what matters.
Again, this is not inherently wrong. Humans do the same thing when researching - we consult far more than we cite. But it does mean the confidence you feel reading a ChatGPT answer is not transparency. It is editorial judgement you cannot see.
What This Means for Builders
If you are building applications on top of ChatGPT's search capabilities, this data changes how you should think about reliability.
First, the model is rewriting your queries. If precision matters - legal research, medical information, technical documentation - you need to account for the fact that the system is interpreting intent, not executing instructions.
Second, you are not seeing the full search process. The sources ChatGPT consulted but did not cite might be exactly the ones you needed. There is no way to audit that decision after the fact.
Third, this behaviour varies across platforms. The developer compared ChatGPT, Perplexity, and Claude. Each one reformulates queries differently. Each one has a different consult-to-cite ratio. If you are comparing answers across systems, you are not comparing the same search process.
The Transparency Problem
The real issue here is not that ChatGPT reformulates queries. It is that most users have no idea it is happening.
When you search Google, you see the query you typed. When you search ChatGPT, you see the answer to a query you did not write. That gap is fine for casual use - asking about recipes or travel recommendations. It is less fine when the stakes matter.
The developer who built this extension is not arguing for removing reformulation. The argument is for visibility. Let users see what the model actually searched for. Let them understand why certain sources were cited and others were not.
That kind of transparency is not just useful for power users. It is how you build trust in systems that are making decisions on your behalf.
For now, the extension exists as a proof of concept. It works, but it is fragile - ChatGPT's SSE format could change at any time, breaking the intercept. What would be better is if this kind of visibility were built into the product itself.
Until then, this is a good reminder: when you ask an AI to search for something, you are not just outsourcing the search. You are outsourcing the question itself. Understanding that difference is the first step toward using these tools well.