
Tip It’s truly amazing how new knowledge and concepts keep emerging. Let’s keep learning.
Today, I saw someone share on Twitter how a single command line can add an llms.txt
file to an existing website, enabling AI crawlers to understand site content more accurately and optimize GEO.
What is an llms.txt
file? What exactly is its purpose? With these questions in mind, I first asked an AI.
What are llms.txt
and llms-full.txt
?
Both llms.txt
and llms-full.txt
are proposed file formats designed to help Large Language Models (LLMs) better understand and utilize website content. They both use Markdown format and aim to provide a more concise and structured way to present website information compared to directly parsing complex HTML.
llms.txt
Purpose:
- Provides a concise, LLM-friendly summary and navigation for website content.
- It acts like a custom sitemap or index for LLMs, listing important pages, documents, or resources on the site, possibly with brief descriptions.
- Helps LLMs quickly locate relevant information without needing to parse the entire website’s complex HTML, advertisements, and JavaScript.
- Guides LLMs to the most valuable content on the site, such as API documentation, policy statements, product information, etc.
- Can supplement
robots.txt
by providing context for content that is allowed to be accessed.
Format:
- Uses Markdown format.
- Typically includes a project name (H1 heading), a website summary (blockquote), and then uses H2 headings to categorize links.
- Primarily contains links to other Markdown files or important pages within the site, accompanied by brief descriptions.
- The goal is to provide a clear structure that is easy for both LLMs and humans to read, while also allowing processing through traditional programming methods like parsers and regular expressions.
llms-full.txt
llms-full.txt
is the complete version of llms.txt
. It includes all the content of llms.txt
plus more details like page titles, descriptions, keywords, etc.
Purpose:
- Provides the full text of key website content, rather than just links to that content.
- It consolidates the content of multiple important pages into a single Markdown file.
- The aim is to allow LLMs to directly access the required information without needing additional navigation or crawling.
- For some AI tools, linking to this file can directly load the entire document content into their context window.
- It is very useful in certain situations, such as providing complete SDK documentation for an AI IDE (Integrated Development Environment) or populating a knowledge base for a chatbot.
Format:
- Also uses Markdown format.
- The typical structure involves an H1 heading (page title) before the content of each included page, a “Source:” link pointing to the original URL, followed by the full Markdown content of that page.
- Because it includes all detailed content, this file can become very large.
- Note:
llms-full.txt
is not part of the originalllms.txt
proposal but is an emerging practice aimed at simplifying content extraction for AI.
How to add llms.txt
to a Hugo Website?
After reviewing the AI’s response, I generally understood that this is essentially Hugo’s TXT output format. Apart from the first two lines of output, which have specific data requirements, the rest of the content can be customized as needed.
Since llms-full.txt
outputs a large amount of content that might exceed the context limits of some LLMs, it’s necessary to carefully select the information you want to present. Therefore, for now, I’ve only generated llms.txt
.
Configure Hugo to Output TXT Files
Hugo’s TXT output format needs to be configured in hugo.toml
.
[outputFormats.TXT]
mediaType = "text/plain"
baseName = "llms"
isPlainText = true
notAlternative = true
Configure the pages that need to output TXT files. I am only outputting llms.txt
for the home page.
[outputs]
home = ["HTML", "JSON","RSS","TXT"]
page = ["HTML", "RSS"]
section = ["HTML", "RSS"]
taxonomy = ["HTML", "RSS"]
Write the template file
In the layouts/_default
directory, create an index.txt
file.
# {{ .Site.Title }}
> {{ .Site.Params.description }}
## Categories
{{ range .Site.Taxonomies.categories }}
### {{ .Page.Title }}
{{ range .Pages }}- [{{ .Title }}]({{ .RelPermalink| absURL }}) - {{ .Summary | plainify | truncate 100 }}
{{ end }}
{{ end }}
When the hugo
command is run, Hugo will generate HTML, JSON, RSS, and TXT files according to the configuration. The output path for the TXT file will be public/llms.txt
.
The final result can be accessed at llms.txt .