Does llms.txt help a website get cited by ChatGPT?

Not on current evidence. No major AI provider has confirmed reading llms.txt, and Google's Search Relations team compared it to the keywords meta tag. I serve one on operosus.com because it is generated automatically from the same database as the rest of the site and costs nothing, but the right level of investment is trivial. Spend the effort on rankings, evidence and entity markup instead.

What schema markup makes an AI more likely to cite a page?

An entity graph with stable @id nodes is the foundation: one Organization node, one Person node and one WebSite node, defined once and referenced from every Article's schema rather than redeclared each time. Add FAQPage schema on answer pages where the acceptedAnswer is a short standalone answer, BreadcrumbList for structure, and a dateModified that matches a real, visible update.

What content changes increase visibility in AI answers?

The Princeton GEO study, presented at KDD 2024, tested nine tactics across 10,000 queries. Adding quotations from named people lifted visibility by around 40%, and adding sourced statistics or citing credible sources lifted it by around 30% each. Keyword stuffing did almost nothing. Ahrefs also found AI assistants cite measurably fresher pages, so genuine update dates matter too.

How long does it take to get cited by ChatGPT?

Months rather than weeks, because AI citations follow search rankings and rankings move slowly. Seer Interactive found a correlation of roughly 0.65 between page-1 Google rankings and brand mentions in AI answers, so the path is: rank for specific questions first, make those pages quotable with evidence, schema and honest dates, then keep refreshing them quarterly.

How I wire a website so ChatGPT actually cites it

The exact wiring I use to get this site cited by ChatGPT: entity graph, answer-first pages, sourced statistics, honest dates, and the bits that are hype.

Last updated 11 June 20267 min read

This is the OPEN-KITCHEN series, and the premise is simple: I give away the keys of what I do, because people won't do it. That is not bravado, it is observation. Every step below is public knowledge if you know where to look, and the businesses that needed it last year still have not done it. The method is not the moat. Doing the work is the moat. So here is the exact wiring on this site, operosus.com, the same wiring I build for clients, held back not one bit.

One ground rule before we start. Every number in this guide comes from published research: the Princeton GEO paper presented at KDD 2024, Seer Interactive's study of brand mentions in AI answers, and Ahrefs' analysis of AI citation freshness. Where I do not have a sourced number, I do not give one. That rule is itself part of the method, as you will see.

1. Rank first, then worry about the AI

Seer Interactive ran 10,000 industry questions through GPT-4o and measured which brands got mentioned. Brands on page 1 of Google showed a correlation of roughly 0.65 with being mentioned by the model. The surprise was backlinks: their impact was weak or even neutral. So the boring conclusion is the true one. Getting cited by ChatGPT is mostly SEO done properly, plus formatting an AI can quote. If your page ranks nowhere, no amount of clever markup rescues it. Fix search visibility for a small set of specific questions before you touch anything else in this guide.

2. Build one entity graph and make every page point at it

Most sites sprinkle schema markup around and hope. This site has three permanent nodes, each with a stable @id: an Organization node at /#organization, a Person node for me at /#dean, and a WebSite node at /#website. They are defined once, in one file, and everything else references them by id rather than redeclaring them.

So every article on the site carries Article schema whose author is { "@id": "https://operosus.com/#dean" }. Not a fresh Person object each time, the same node, reinforced on every page. The Person node says I work for the Organization. The Organization names me as founder and carries the Companies House number as a verifiable identifier. Both nodes carry a knowsAbout list, and here is the discipline: the list only contains things we demonstrably do. Mine says marketing technology, AI automation, SEO, content systems. It does not say quantum computing.

Why bother? Because an AI assembling an answer wants to attribute claims to identifiable entities. A site where the founder, the company and the content all connect through consistent, verifiable identifiers reads like a source. A site with seventeen contradictory schema blobs reads like noise.

3. Make answer pages an AI can lift without thinking

This site has an /answers section: one page per question, and the page is built backwards from how an AI consumes it.

The direct answer is the first thing on the page, in a visually distinct block, before any background or wind-up. The page carries FAQPage schema with a single Question, and the acceptedAnswer is that short first-line answer, not the whole article. There is a visible "Last updated" date. At the bottom, a plain line: answered by Dean Cookson, founder of Operosus. Named human, connected to the Person node from step 2.

The Princeton study explains why this shape works. Answer engines reuse self-contained passages. A two-sentence answer that stands alone gets lifted verbatim. An answer buried under four paragraphs of throat-clearing gets skipped.

4. Publish a statistics page as citation bait

The single most quotable thing on this site is a page of sourced statistics about UK small business AI adoption, every number linked to a primary source. This is deliberate. The Princeton researchers tested nine content tactics across 10,000 queries and found adding statistics lifted visibility in AI answers by around 30%. Adding quotations from named people did even better, around 40%. Keyword stuffing did almost nothing.

AI answers are hungry for numbers with provenance. If you maintain the page in your niche where the numbers live, with sources, you become the page that gets cited when anyone asks the question. Pick the data your customers ask about, compile it from primary sources, date it, and keep it current. It is unglamorous work, which is exactly why your competitors will not do it.

5. Put evidence in every single piece

Our content pipeline bakes in rules for every guide that goes out, and they come straight from the same research:

At least three sourced statistics per piece. Never invented. Primary sources only.
At least one quotable line from a named person.
Inline citations for any claim that is not ours.
Question-shaped H2s, plus lists and tables an engine can extract.

The flip side of "every number needs a source" is that you delete the numbers you cannot source. We have binned good-sounding statistics because the trail ended at someone's LinkedIn post. That hurts in the moment and pays forever, because one fabricated number, found, poisons the credibility of everything else on the domain.

6. Show your dates and mean them

Ahrefs analysed roughly 17 million cited URLs and found pages cited by AI assistants were 25.7% fresher than pages in organic search results, with ChatGPT showing the strongest freshness preference of all, citing pages around 458 days newer than what organic Google surfaced.

So dates are wired in everywhere here: a visible "Last updated" line on the page, dateModified in the Article schema, modified time in the Open Graph tags. And we run a quarterly refresh pass on the top guides so those dates are honest. That last word matters. Bumping a timestamp without changing the page is the freshness equivalent of keyword stuffing, and it will age about as well.

7. Wire the mesh, including llms.txt, with low expectations

Everything publishable on this site lives in a database, which means the connective tissue generates itself. The sitemap, the breadcrumbs with schema, the internal links from guides to case studies to service pages, and an llms.txt file: a markdown map of the whole site with an entity paragraph at the top, a verification block (legal name, Companies House number, founder, contact), then every guide, answer page and case study enumerated one line each, so an assistant can cite the specific page for a specific question rather than waving at the homepage.

Now the honest bit, because this is the open kitchen. llms.txt is mostly hype. No major AI provider has confirmed reading it, and Google's Search Relations team compared it to the keywords meta tag, which is search engine speak for "decorative". We serve one anyway because it is generated from the same database as everything else and costs nothing. That is the correct level of investment: trivial. If anyone quotes you real money for an llms.txt strategy, keep your hand on your wallet. Same goes for "AI Overview specific markup", which does not exist.

Where this goes wrong

I have watched each of these kill the whole effort, so check yourself against the list.

Faking the evidence. Invented statistics, testimonials nobody said, a "study" that turns out to be a blog post citing another blog post. Engines cross-reference. You will not get away with it, and you should not.

Schema on top of nothing. JSON-LD on a thin, unranked page is a suit on a scarecrow. The markup describes value, it does not create it. Step 1 comes first for a reason.

Burying the answer. If your answer page opens with "In today's fast-paced digital landscape", delete the page and have a word with whoever wrote it.

Treating it as a project instead of a practice. The site that gets cited is the one still updating its statistics page eighteen months in. Most teams stop at month two, which, again, is why this guide costs you nothing.

Buying the hype layer. GEO tools, llms.txt consultants, AI-visibility dashboards scoring you against metrics they invented. The evidence-backed list is short: rank, entities, answers, statistics, quotations, sources, freshness. Everything else is margin for the person selling it.

What it actually takes

Here is the honest cost. You need a developer who can render JSON-LD properly and keep entity ids stable through redesigns. You need a writer who checks primary sources and deletes the lines that fail. You need someone to refresh the top pages every quarter, forever, and you need the patience to keep doing all of it for months before a citation shows up, because rankings move slowly and AI answers follow rankings. None of it is hard. All of it is work. That is the whole trick, and now you have the keys.

If you would rather we just did it for you, book a consultation and I will walk you through what it looks like on your site.

Frequently asked questions

Does llms.txt help a website get cited by ChatGPT?: Not on current evidence. No major AI provider has confirmed reading llms.txt, and Google's Search Relations team compared it to the keywords meta tag. I serve one on operosus.com because it is generated automatically from the same database as the rest of the site and costs nothing, but the right level of investment is trivial. Spend the effort on rankings, evidence and entity markup instead.
What schema markup makes an AI more likely to cite a page?: An entity graph with stable @id nodes is the foundation: one Organization node, one Person node and one WebSite node, defined once and referenced from every Article's schema rather than redeclared each time. Add FAQPage schema on answer pages where the acceptedAnswer is a short standalone answer, BreadcrumbList for structure, and a dateModified that matches a real, visible update.
What content changes increase visibility in AI answers?: The Princeton GEO study, presented at KDD 2024, tested nine tactics across 10,000 queries. Adding quotations from named people lifted visibility by around 40%, and adding sourced statistics or citing credible sources lifted it by around 30% each. Keyword stuffing did almost nothing. Ahrefs also found AI assistants cite measurably fresher pages, so genuine update dates matter too.
How long does it take to get cited by ChatGPT?: Months rather than weeks, because AI citations follow search rankings and rankings move slowly. Seer Interactive found a correlation of roughly 0.65 between page-1 Google rankings and brand mentions in AI answers, so the path is: rank for specific questions first, make those pages quotable with evidence, schema and honest dates, then keep refreshing them quarterly.