The AI content sludge problem: why most online AI advice is fake
Machine-written articles now outnumber human ones online, and many cite statistics nobody ever measured. How to spot fabricated numbers in AI content.
Can you trust AI-generated content? As a reader: only as far as you can trace its claims to a primary source, and most of it gives you no way to do that. The web is now filling with articles about AI written by AI, padded with statistics that sound precise and lead nowhere. We build AI content systems for a living at Operosus, which is exactly why we want to talk about the sludge problem honestly: the tools are not the issue, the publishing standards are.
"A statistic with no source is not a fact. It is a rumour with a percentage sign."
Dean Cookson, founder, Operosus
Here is the test that matters. Pick any article promising "AI marketing statistics" or "how to rank in ChatGPT" and click the links behind its numbers. Often there are no links at all. When there are, they tend to point at another blog post, which points at a roundup, which points at a listicle, and somewhere down the chain the original study simply is not there. The number was never measured by anyone. It was generated, then laundered through repetition until it looked like common knowledge.
How much of the web is machine-written now?
More than you would guess, and the crossover has already happened. Graphite analysed tens of thousands of articles from the Common Crawl web archive and found that by November 2024, more new articles were AI-generated than human-written. Within a year of ChatGPT launching, machine-generated pieces already made up nearly 39% of new articles, and the share kept climbing before levelling off at roughly half.
The same study contains the detail most people miss: those AI-generated articles largely fail to show up in Google results or ChatGPT citations. The sludge mostly does not rank. But it does not need to rank to do damage, because it still gets scraped, quoted, summarised and recycled. A fabricated statistic does not have to win page one of Google. It just has to sit on enough pages that the next model, and the next content writer, treats it as established fact.
The deliberately deceptive end of this is growing too. NewsGuard's AI tracking centre has identified more than 3,000 websites that publish AI-generated content dressed up as news, across 16 languages, with no human editorial oversight and no disclosure. Those are just the sites pretending to be newsrooms. The marketing equivalent, AI-written advice blogs churned out to catch search traffic, is far larger and nobody is counting it.
Where do the fake statistics come from?
Three places, and they feed each other.
First, the models themselves. A language model completes patterns. "Studies show that" is a pattern that wants a percentage after it, so the model supplies one. The number is fluent, plausible and unattached to any study. If the person publishing the draft never checks, it ships.
Second, the incentive. The current gold rush is content written to get cited by AI assistants, often sold as generative engine optimisation (we cover what the evidence actually supports in our GEO guide). The recipe circulating in that world is explicit: assistants love statistics, so stuff your pages with statistics. Whether the statistics are real is treated as a detail. The result is a category of content that exists purely to be ingested by machines, written largely by machines, and checked by no one.
Third, the loop closes. AI search tools then read this material and repeat it with total confidence. When Columbia's Tow Center tested eight AI search engines on 1,600 citation queries, they answered incorrectly more than 60% of the time, fabricated links to articles that did not exist, and almost never expressed doubt. ChatGPT got 134 of 200 articles wrong while flagging uncertainty just 15 times. So even when a real source exists, the assistant frequently mangles or invents the citation, and the reader has no idea which has happened.
Put those together and you get the sludge cycle: a model invents a number, a content farm publishes it, fifty more articles repeat it, an AI assistant summarises the consensus, and a business owner makes a decision based on it.
How do you spot a fabricated statistic?
You do not need a detection tool. You need about ninety seconds of scepticism.
Click the link. If a number carries no link, treat it as unverified. If the link goes to another blog rather than the organisation that ran the research, keep clicking until you reach the original study or accept that you never will.
Check the source could plausibly know. A real statistic comes from someone in a position to measure it: a survey with a stated sample, server logs, platform data, a regulator. "Experts estimate" and "research suggests" with no named researcher are tells.
Search the number itself. Paste the exact claim into a search engine. If every result is the same sentence on different marketing blogs and none of them is a primary source, you are looking at laundering, not evidence.
Watch for suspicious precision. Numbers like "73% of marketers" appear constantly in AI-written content because they look measured. Real research is usually messier: it has a date, a sample size, a margin, and findings that do not all point conveniently at buying what the author sells.
A statistic with no source is not a fact. It is a rumour with a percentage sign.
Why we cite primary sources
Because the alternative quietly destroys the thing we sell. Operosus builds AI content and outbound systems, including Contentwell for editorial content, Emailwell for email, and Bidwell for tender responses. Every one of those systems works by grounding the model in real material: your documents, your data, named research, actual logs. When Bidwell drafts an answer for a tender, it draws on the client's own evidence, because a made-up claim in a bid is not just sloppy, it can sink the submission. The same discipline applies to what we publish under our own name. Every number in this article links to the organisation that produced it, and when we cannot find a primary source for a claim, the claim does not go in.
That is also, as it happens, the commercially smart position. The Graphite finding above shows undifferentiated AI content mostly fails to surface anywhere that matters. What earns citations, from Google and from AI assistants alike, is material containing something verifiable that exists nowhere else: your data, your pricing, your honest experience. The Princeton GEO study measured exactly this, and evidence beat every styling trick in the test. Sludge is not just dishonest. It is ineffective.
So, can you trust AI-generated content? Wrong question, slightly. Trust was never a property of who typed the words. Human writers have been padding articles with dubious numbers since long before ChatGPT. Trust comes from whether claims trace back to evidence, and AI has simply made untraceable claims cheap to produce at enormous volume. The fix is the same as it always was: demand sources, follow the links, and publish nothing you cannot stand behind. If your AI supplier, agency or content tool cannot show you where its numbers come from, that tells you everything about what else it is making up.