The Two-Hour Attack That Poisons AI Models

A Chinese court is reviewing the country's first AI hallucination fraud case. DeepSeek generated a fabricated biography so convincing that readers believed it was real. The person in question doesn't exist. The biography was entirely invented - names, credentials, career history, all of it.

This isn't a one-off failure. Researchers demonstrated they could poison a language model in two hours to recommend fake brands. Not exploit a bug. Not hack the system. Simply feed it carefully crafted training data and watch it confidently recommend products that don't exist.

Why LLMs Lie With Confidence

The problem is structural. Large language models don't know things - they predict the next most likely word based on patterns in their training data. When you ask an LLM a question, it's not retrieving facts from a database. It's generating text that sounds like the answer you'd expect.

That's why hallucinations feel so convincing. The model isn't guessing randomly. It's producing text that matches the statistical patterns of accurate information. A fake biography reads exactly like a real one because the model learned what biographies look like, not whether the facts are true.

This creates a dangerous feedback loop. Hallucinated content gets published. That content becomes training data for the next generation of models. The lies become statistically more likely to appear in future outputs because they're now part of the pattern.

The Two-Hour Poisoning Attack

The researcher experiment is alarming in its simplicity. Take a language model. Feed it training examples where a fake brand name appears in positive contexts. Two hours later, the model recommends that brand when asked for product suggestions.

No sophisticated attack vector. No security breach. Just the normal training process working exactly as designed. The model learned a pattern and reproduced it. The pattern happened to be malicious.

This matters because we're building systems that trust LLM outputs. Search engines surface AI-generated summaries. Customer service tools use LLMs to answer questions. Medical chatbots offer health advice. Every one of these systems is vulnerable to the same fundamental issue - the model has no concept of truth, only plausibility.

Why This Gets Worse

The China fraud case demonstrates real-world consequences. Someone used an AI-generated biography for what appears to be fraudulent purposes. The court is now trying to establish accountability. Who is responsible when an AI system confidently states fiction as fact?

The legal framework doesn't exist yet. Is it the model developer's responsibility to prevent hallucinations? The user who deployed the system? The person who relied on the output? The answer matters because billions of pounds in business decisions are being made based on LLM outputs.

Worse, the poisoning research shows that bad actors don't need access to the model's weights or architecture. They just need to influence the training data. That could be as simple as flooding the internet with carefully crafted fake content and waiting for the next model to scrape it.

Some companies are trying technical fixes. Retrieval-augmented generation pulls facts from verified databases before generating text. Confidence scoring flags outputs the model is uncertain about. Human review catches obvious errors before publication.

None of these solve the core problem. They're filters on top of a system that fundamentally cannot distinguish truth from convincing-sounding lies. The model still generates hallucinations - we're just trying to catch them before they cause damage.

What Actually Works

The only reliable approach is treating LLMs as what they are - text generation tools, not knowledge systems. Use them for drafting, summarising, and pattern-matching. Don't use them as sources of truth.

That means verifying every factual claim an LLM makes. Treating its outputs as suggestions, not answers. Building systems that assume hallucinations will happen and plan accordingly.

For developers, it means being honest about capabilities. An LLM can help write code, but it will occasionally invent function names that don't exist. It can summarise documents, but it might add details that weren't there. It can answer questions, but sometimes those answers will be completely wrong and utterly convincing.

The DeepSeek biography fooled people because it looked right. The poisoned model recommended fake brands because that's what it learned to do. The China court case exists because someone trusted an AI system that had no mechanism for truth.

This isn't a temporary problem waiting for better models. It's inherent to how language models work. They predict plausible text. Sometimes plausible and true align. Sometimes they don't. The model can't tell the difference.