Blocking AI bots with robots.txt in Jekyll
Although I don’t think this blog is popular enough to be scraped by any of the AI companies, I was curious about how to opt out of these scrapers. The easiest way to block AI bots from crawling a site is through the robots.txt file.
This site uses the jekyll-sitemap gem
to generate the robots.txt file
with a sitemap url,
so the first step is to replicate
the robots.txt generated by the gem.
I had to add some YAML front matter at the top
to let Jekyll know to render the .txt
file as markdown.
---
layout: null
sitemap: false
---
Sitemap: https://nithinbekal.com/sitemap.xml
Now you can disallow any bot by adding a rule like this one, which blocks the ChatGPT bot:
User-agent: GPTBot
Disallow: /
Neil Clarke has an excellent post with a comprehensive list of AI bots to block. You can also look at this site’s robots.txt.