- April 2025 (2)
- March 2025 (7)
- February 2025 (10)
- January 2025 (6)
- December 2024 (7)
- September 2024 (1)
- August 2024 (2)
- July 2024 (2)
- May 2024 (2)
- April 2024 (2)
- February 2024 (2)
- April 2023 (1)
- March 2023 (2)
- September 2022 (1)
- February 2022 (1)
- November 2021 (1)
- March 2021 (1)
- February 2021 (2)
- August 2019 (1)
- November 2018 (1)
- May 2017 (1)
- December 2016 (1)
- April 2016 (1)
- August 2015 (1)
- December 2014 (1)
- August 2014 (1)
- March 2014 (1)
- December 2013 (1)
- October 2013 (3)
- September 2013 (4)
- August 2013 (2)
- July 2013 (1)
- June 2013 (1)
- February 2013 (1)
- October 2012 (1)
- June 2012 (1)
- May 2012 (1)
- April 2012 (1)
- February 2012 (1)
- October 2011 (1)
- June 2011 (1)
- May 2011 (1)
- April 2011 (1)
- March 2011 (1)
- February 2011 (1)
- January 2011 (1)
- December 2010 (3)
- November 2010 (1)
- October 2010 (1)
- September 2010 (1)
- August 2010 (1)
- July 2010 (1)
- May 2010 (3)
- April 2010 (1)
- March 2010 (2)
- February 2010 (3)
- January 2010 (4)
- December 2009 (2)
- November 2009 (5)
- October 2009 (2)
- September 2009 (2)
- August 2009 (3)
- July 2009 (1)
- May 2009 (1)
- April 2009 (1)
- March 2009 (5)
- February 2009 (5)
- January 2009 (5)
- December 2008 (3)
- November 2008 (7)
- October 2008 (4)
- September 2008 (2)
- August 2008 (1)
- July 2008 (1)
- June 2008 (1)
- May 2008 (1)
- April 2008 (1)
- January 2008 (4)
- December 2007 (3)
- March 2007 (3)
- February 2007 (1)
- January 2007 (2)
- December 2006 (4)
- November 2006 (18)
- Python (54)
- TIL deep dives (39)
- AI (37)
- Resolver One (34)
- Blogkeeping (18)
- PythonAnywhere (16)
- Linux (15)
- Startups (15)
- LLM from scratch (13)
- NSLU2 offsite backup project (13)
- TIL (13)
- Funny (11)
- Finance (10)
- Fine-tuning LLMS (10)
- C (9)
- Gadgets (8)
- Musings (8)
- Robotics (8)
- Website design (8)
- Personal (7)
- 3D (5)
- Rants (5)
- Cryptography (4)
- JavaScript (4)
- Music (4)
- Oddities (4)
- Quick links (4)
- Talks (4)
- Dirigible (3)
- Eee (3)
- Memes (3)
- Politics (3)
- Django (2)
- GPU Computing (2)
- LaTeX (2)
- MathML (2)
- OLPC XO (2)
- Space (2)
- VoIP (2)
- Copyright (1)
- Golang (1)
- Raspberry Pi (1)
- Software development tools (1)
- Agile Abstractions
- Astral Codex Ten
- aychedee
- :: (Bloggable a) => a -> IO ()
- David Friedman's Substack
- Entrepreneurial Geekiness
- For some value of "Magic"
- Hackaday
- Knowing.NET
- Language Log
- Millennium Hand
- ntoll.org
- PK
- PythonAnywhere News
- Simon Willison's Weblog
- Software Deviser
- Some opinions, held with varying degrees of certainty
- tartley.com
Adding /llms.txt
The /llms.txt file is an idea from Jeremy Howard. Rather than making LLMs parse websites with HTML designed to make it look pretty for humans, why not publish the same content separately as Markdown? It's generally not much extra effort, and could make your content more discoverable and useful for people using AIs.
I think its most useful for things like software documentation; Stripe and Anthropic seem to think so too, having both recently added it for theirs.
It's less obviously useful for a blog like this. But I write everything here in Markdown anyway, and just run it through markdown2 and some Jinja2 templates to generate the HTML, so I thought adding support would be a bit of fun; here it is.
One thing that isn't covered by the proposal, at least as far as I could see, is
how LLMs should know that there is a special version of the site just for them. A link
tag with type
set to alternate
seemed like a good idea for that; I already had
one to help RSS readers find the feed URL:
<link rel="alternate" type="application/rss+xml" title="Giles Thomas" href="/feed/rss.xml" />
...so with a quick check of the docs to make sure I wasn't doing anything really stupid, I decided on this:
<link rel="alternate" type="text/markdown" title="LLM-friendly version" href="/llms.txt" />
There were a couple of other mildly interesting implementation details.
Obviously I didn't want to put all of the blog's content into a single file, so I made the top-level one have links to all of the posts (plus a link to the about page). These links needed to go to Markdown versions of the posts, of course, so what URLs should I use? The proposal says that:
We furthermore propose that pages on websites that have information that might be useful for LLMs to read provide a clean markdown version of those pages at the same URL as the original page, but with .md appended. (URLs without file names should append index.html.md instead.)
Now, that clashed a bit with the canonical URL patterns I use on this blog. You'll
see that this page's URL ends with /2025/03/llmstxt
, no trailing slash. It's actually
an HTML file called index.html
stored in a directory called llmstxt
, but my
static file setup handles all of that.
However, technically, from a URL point of view, that structure
means that the page is claiming to be a file called llmstxt
--
a directory is meant to end with a slash.
That means that the LLM-friendly version
should be at the same URL with .md
at the end -- so it is.
My first implementation of that was to create another directory next to llmstxt
called llmstxt.md
and then to put the markdown into a file called index.html
there, which was easy to do, but:
- It meant that the file was served up with a content type of
text/html
which meant the browsers didn't render it properly, and - It was stupid and ugly.
So now the directory /2025/03/
contains a directory called llmstxt
which contains a file
called index.html
for the human-readable post, but it also contains a file called
llmstxt.md
for the AI-friendly one.
Once I'd done that, I needed to link to them from the post pages as well, so I just added another alternate link:
<link rel="alternate" type="text/markdown" title="LLM-friendly version" href="{{ post.url }}.md" />
Once that was done, I had something that worked nicely! Total time taken, a couple of hours. I think it was worthwhile at that cost -- if I had a larger site or had the sources in a non-Markdown format, I might have been more hesitant.
Next, I need to do the PythonAnywhere documentation :-)
[Update: that's done now!]
Is anyone else adding /llms.txt
to their sites?