llms.txt explained: the new robots.txt for AI assistants

llms.txt is a plain-text file you put at the root of your site – `/llms.txt` – that tells large language models which parts of your content are worth reading, in what order, and where the clean Markdown version lives. It was proposed by Jeremy Howard in September 2024 and by mid-2026 the format has been adopted by thousands of sites, including Anthropic, Cloudflare, Stripe, and Zapier.

It is not, however, a standard the major AI assistants formally honour. That distinction matters, and most of the guides currently ranking on this topic gloss over it. This post is our attempt to be honest about what llms.txt does, what it does not do, and whether you should publish one anyway. Spoiler: yes, but for reasons that are subtler than "ChatGPT will suddenly cite you".

What llms.txt actually is

llms.txt is a Markdown-formatted file that describes your site to an LLM in the shape an LLM can consume cheaply. Where `sitemap.xml` lists every URL for a crawler, `llms.txt` curates a small, prioritised list of URLs – usually with a short description of each – so a model given a limited context window can pick the right pages to read.

The format is deliberately simple: an H1 with the site name, an optional blockquote summary, then H2 sections containing bulleted links. Each link points to a Markdown-friendly version of the underlying page. That is it. There is no schema, no validator, no crawler that will penalise you for getting it wrong.

# SEMOptimiser

> SEMOptimiser is an SEO and SEM intelligence platform for digital marketers.
> We help teams research keywords, track suburb-level rankings, audit sites,
> and measure visibility across AI assistants like ChatGPT and Perplexity.

## Docs

- [Getting started](https://semoptimiser.com/docs/getting-started.md): First-run setup for a new workspace.
- [Rank tracker API](https://semoptimiser.com/docs/api/rank-tracker.md): REST endpoints for programmatic rank checks.
- [Site audit rules](https://semoptimiser.com/docs/audit/rules.md): The full list of checks the crawler runs.

## Guides

- [The 2026 AI Visibility Playbook](https://semoptimiser.com/blog/ai-visibility-2026.md): 47-point checklist for LLM citations.
- [Core Web Vitals INP migration](https://semoptimiser.com/blog/core-web-vitals-inp.md): 30-day plan for fixing INP.
- [Local rank tracking down to suburb level](https://semoptimiser.com/blog/local-rank-tracking.md): Why national rank tracking lies.

## Optional

- [Changelog](https://semoptimiser.com/changelog.md): Product updates.
- [Pricing](https://semoptimiser.com/pricing.md): Plans and quotas.

Note the `.md` extension on every URL. The convention is that each linked page is also available as clean Markdown at the same path plus `.md` – no navigation chrome, no ads, no JavaScript required. If your CMS cannot serve `.md` variants, link to the HTML URL and accept that fewer models will parse it cleanly.

llms.txt vs llms-full.txt

The proposal defines two files. `llms.txt` is the curated index. `llms-full.txt` is the concatenated full content of every page in the index, in one file, so a model can consume the entire site in a single fetch. `llms-full.txt` is optional and only makes sense for reference-style content – docs, API specs, guides – not for a blog with 500 posts.

Publish llms.txt if you have any content you want models to prioritise. Cost: an hour of writing, updated when your top pages change.
Publish llms-full.txt only if your content is reference-shaped and fits within roughly 2MB of concatenated Markdown. Larger than that and no model will fetch it in a single call anyway.
Do not publish llms-full.txt for marketing sites, blogs, or e-commerce catalogues. It is the wrong shape.

Who actually honours llms.txt

This is where most guides get sloppy. Let's be precise. As of mid-2026, no major AI assistant has publicly committed to fetching `/llms.txt` on your domain as part of its normal answer-generation pipeline. What they do is more nuanced.

System	Formally honours llms.txt?	What actually happens
ChatGPT (OpenAI)	No	OAI-SearchBot and ChatGPT-User crawl based on links and sitemaps. If a user pastes your URL, ChatGPT may fetch the `.md` variant if it exists – but it does not auto-discover llms.txt.
Claude (Anthropic)	No	Claude respects robots.txt for ClaudeBot and anthropic-ai. It does not auto-fetch llms.txt but will use it if a user provides the URL directly.
Perplexity	No	PerplexityBot indexes based on standard crawling. Anecdotally, sites with clean Markdown variants get cited more, but Perplexity does not confirm llms.txt is a signal.
Google AI Overviews	No	Uses the standard Googlebot crawl. llms.txt is not a documented ranking or citation signal.
Gemini	No	Same – no formal support. Structured data and clean HTML matter more.
Cursor / Windsurf / dev tools	Yes (indirectly)	AI coding assistants routinely fetch llms.txt when a developer references your docs, because the format is optimised for their use case.

Why publish one anyway

Given that no major consumer assistant currently auto-fetches llms.txt, why bother? Four reasons, in order of concreteness:

Direct-fetch scenarios are common. When a user pastes your URL into ChatGPT, Claude, or Perplexity, or when a developer asks Cursor about your API, the model fetches your page. A `.md` variant advertised in llms.txt gets parsed more cleanly than your HTML, meaning fewer hallucinations about your product.
The AI coding tools already use it. Cursor, Windsurf, Continue, Zed AI – the developer-tool ecosystem has effectively standardised on llms.txt. If your product has an API or SDK, this is not optional.
Publishing forces content discipline. Writing an llms.txt makes you pick the 20 pages that actually matter. That exercise alone is worth an afternoon.
Bet on the standard. Formal adoption by major assistants is plausible within 12-18 months. Sites that publish now will not need to scramble later.

How to publish one

The mechanics are simple. Total time: an afternoon for a small site, a day for a docs-heavy one.

Pick the 10-30 pages you most want an LLM to read. Prioritise reference docs, how-to guides, and cornerstone articles. Deprioritise blog posts older than 18 months, landing pages, and gated content.
For each page, produce a clean Markdown version at the same URL plus `.md`. In Next.js, add a route handler that renders the same MDX source without the site chrome. In a CMS, most modern systems have a plugin.
Write the llms.txt. H1 with your brand. Blockquote summary in one sentence. H2 sections grouping the links. Each link gets a colon and a short description – a full sentence, not a title.
Serve it from the site root with `Content-Type: text/plain; charset=utf-8`. Do not gate it behind auth or a bot check.
Add a reference to it from `robots.txt` as a courtesy: `# LLM index: https://yoursite.com/llms.txt`. Not a standard directive, but human-readable.
Regenerate on every deploy. Stale llms.txt files pointing to 404s are worse than none.

If you would rather not write it by hand, our llms.txt generator ingests your sitemap, ranks pages by inbound links and engagement, and outputs a first-draft file you can edit.

Common mistakes

Linking to HTML URLs when Markdown variants exist. Always link to the `.md` – that is the whole point.
Dumping your whole sitemap. llms.txt is a curated index. If it has 500 links, you have missed the format.
Descriptions that repeat the title. "Getting started: A guide to getting started" is noise. Say what the page contains, not what it is called.
Forgetting to update it. URLs change. If your llms.txt lists a 301-redirected URL, most models will not follow the redirect for a text-file entry.
Publishing llms-full.txt for a blog. A 4MB concatenated file that includes every post from 2019 will not be fetched. It is technical debt disguised as content strategy.

What llms.txt does not do

To close the loop on the honest framing:

It does not stop LLMs from training on your content – that is a robots.txt job, and even that is partly voluntary.
It does not guarantee citations in any AI assistant.
It does not replace structured data (schema.org), a valid sitemap, or a canonical URL.
It does not help ranking in classic Google search.

It is a small file that makes your site cheaper for an LLM to read correctly when it does read it. That is a modest claim, and it is the only honest one.

What to do next

Draft your llms.txt this week. Pick the 20 pages that would answer 80% of what a user or a developer might ask about your product, write one clean sentence per page, and ship it to `/llms.txt`. Then serve `.md` variants of those 20 pages. If you want the first draft generated from your sitemap, use our llms.txt generator. For the broader picture on how AI assistants pick sources, see The 2026 AI Visibility Playbook.

What llms.txt actually is

# SEMOptimiser

> SEMOptimiser is an SEO and SEM intelligence platform for digital marketers.
> We help teams research keywords, track suburb-level rankings, audit sites,
> and measure visibility across AI assistants like ChatGPT and Perplexity.

## Docs

- [Getting started](https://semoptimiser.com/docs/getting-started.md): First-run setup for a new workspace.
- [Rank tracker API](https://semoptimiser.com/docs/api/rank-tracker.md): REST endpoints for programmatic rank checks.
- [Site audit rules](https://semoptimiser.com/docs/audit/rules.md): The full list of checks the crawler runs.

## Guides

- [The 2026 AI Visibility Playbook](https://semoptimiser.com/blog/ai-visibility-2026.md): 47-point checklist for LLM citations.
- [Core Web Vitals INP migration](https://semoptimiser.com/blog/core-web-vitals-inp.md): 30-day plan for fixing INP.
- [Local rank tracking down to suburb level](https://semoptimiser.com/blog/local-rank-tracking.md): Why national rank tracking lies.

## Optional

- [Changelog](https://semoptimiser.com/changelog.md): Product updates.
- [Pricing](https://semoptimiser.com/pricing.md): Plans and quotas.

llms.txt vs llms-full.txt

Publish llms.txt if you have any content you want models to prioritise. Cost: an hour of writing, updated when your top pages change.
Publish llms-full.txt only if your content is reference-shaped and fits within roughly 2MB of concatenated Markdown. Larger than that and no model will fetch it in a single call anyway.
Do not publish llms-full.txt for marketing sites, blogs, or e-commerce catalogues. It is the wrong shape.

Who actually honours llms.txt

System	Formally honours llms.txt?	What actually happens
ChatGPT (OpenAI)	No	OAI-SearchBot and ChatGPT-User crawl based on links and sitemaps. If a user pastes your URL, ChatGPT may fetch the `.md` variant if it exists – but it does not auto-discover llms.txt.
Claude (Anthropic)	No	Claude respects robots.txt for ClaudeBot and anthropic-ai. It does not auto-fetch llms.txt but will use it if a user provides the URL directly.
Perplexity	No	PerplexityBot indexes based on standard crawling. Anecdotally, sites with clean Markdown variants get cited more, but Perplexity does not confirm llms.txt is a signal.
Google AI Overviews	No	Uses the standard Googlebot crawl. llms.txt is not a documented ranking or citation signal.
Gemini	No	Same – no formal support. Structured data and clean HTML matter more.
Cursor / Windsurf / dev tools	Yes (indirectly)	AI coding assistants routinely fetch llms.txt when a developer references your docs, because the format is optimised for their use case.

Why publish one anyway

Given that no major consumer assistant currently auto-fetches llms.txt, why bother? Four reasons, in order of concreteness:

Direct-fetch scenarios are common. When a user pastes your URL into ChatGPT, Claude, or Perplexity, or when a developer asks Cursor about your API, the model fetches your page. A `.md` variant advertised in llms.txt gets parsed more cleanly than your HTML, meaning fewer hallucinations about your product.
The AI coding tools already use it. Cursor, Windsurf, Continue, Zed AI – the developer-tool ecosystem has effectively standardised on llms.txt. If your product has an API or SDK, this is not optional.
Publishing forces content discipline. Writing an llms.txt makes you pick the 20 pages that actually matter. That exercise alone is worth an afternoon.
Bet on the standard. Formal adoption by major assistants is plausible within 12-18 months. Sites that publish now will not need to scramble later.

How to publish one

The mechanics are simple. Total time: an afternoon for a small site, a day for a docs-heavy one.

Pick the 10-30 pages you most want an LLM to read. Prioritise reference docs, how-to guides, and cornerstone articles. Deprioritise blog posts older than 18 months, landing pages, and gated content.
For each page, produce a clean Markdown version at the same URL plus `.md`. In Next.js, add a route handler that renders the same MDX source without the site chrome. In a CMS, most modern systems have a plugin.
Write the llms.txt. H1 with your brand. Blockquote summary in one sentence. H2 sections grouping the links. Each link gets a colon and a short description – a full sentence, not a title.
Serve it from the site root with `Content-Type: text/plain; charset=utf-8`. Do not gate it behind auth or a bot check.
Add a reference to it from `robots.txt` as a courtesy: `# LLM index: https://yoursite.com/llms.txt`. Not a standard directive, but human-readable.
Regenerate on every deploy. Stale llms.txt files pointing to 404s are worse than none.

If you would rather not write it by hand, our llms.txt generator ingests your sitemap, ranks pages by inbound links and engagement, and outputs a first-draft file you can edit.

Common mistakes

Linking to HTML URLs when Markdown variants exist. Always link to the `.md` – that is the whole point.
Dumping your whole sitemap. llms.txt is a curated index. If it has 500 links, you have missed the format.
Descriptions that repeat the title. "Getting started: A guide to getting started" is noise. Say what the page contains, not what it is called.
Forgetting to update it. URLs change. If your llms.txt lists a 301-redirected URL, most models will not follow the redirect for a text-file entry.
Publishing llms-full.txt for a blog. A 4MB concatenated file that includes every post from 2019 will not be fetched. It is technical debt disguised as content strategy.

What llms.txt does not do

To close the loop on the honest framing:

It does not stop LLMs from training on your content – that is a robots.txt job, and even that is partly voluntary.
It does not guarantee citations in any AI assistant.
It does not replace structured data (schema.org), a valid sitemap, or a canonical URL.
It does not help ranking in classic Google search.

It is a small file that makes your site cheaper for an LLM to read correctly when it does read it. That is a modest claim, and it is the only honest one.

llms.txt explained: the new robots.txt for AI assistants

What llms.txt actually is

llms.txt vs llms-full.txt

Who actually honours llms.txt

Why publish one anyway

How to publish one

Common mistakes

What llms.txt does not do

What to do next

Put this into practice with SEMOptimiser

The 2026 AI Visibility Playbook: how to rank in ChatGPT, Gemini and Perplexity

How we measure AI search citations across 4 LLMs daily

Core Web Vitals INP migration: a 30-day plan

llms.txt explained: the new robots.txt for AI assistants

What llms.txt actually is

llms.txt vs llms-full.txt

Who actually honours llms.txt

Why publish one anyway

How to publish one

Common mistakes

What llms.txt does not do

What to do next

Put this into practice with SEMOptimiser

The 2026 AI Visibility Playbook: how to rank in ChatGPT, Gemini and Perplexity

How we measure AI search citations across 4 LLMs daily

Core Web Vitals INP migration: a 30-day plan

What llms.txt actually is

llms.txt vs llms-full.txt

Who actually honours llms.txt

Why publish one anyway

How to publish one

Common mistakes

What llms.txt does not do

What to do next

Put this into practice with SEMOptimiser

Keep reading

The 2026 AI Visibility Playbook: how to rank in ChatGPT, Gemini and Perplexity

How we measure AI search citations across 4 LLMs daily

Core Web Vitals INP migration: a 30-day plan

What llms.txt actually is

llms.txt vs llms-full.txt

Who actually honours llms.txt

Why publish one anyway

How to publish one

Common mistakes

What llms.txt does not do

What to do next

Put this into practice with SEMOptimiser

Keep reading

The 2026 AI Visibility Playbook: how to rank in ChatGPT, Gemini and Perplexity

How we measure AI search citations across 4 LLMs daily

Core Web Vitals INP migration: a 30-day plan