Nithin Bekal About

Blocking AI bots with robots.txt in Jekyll

25 May 2024

Although I don’t think this blog is popular enough to be scraped by any of the AI companies, I was curious about how to opt out of these scrapers. The easiest way to block AI bots from crawling a site is through the robots.txt file.

This site uses the jekyll-sitemap gem to generate the robots.txt file with a sitemap url, so the first step is to replicate the robots.txt generated by the gem. I had to add some YAML front matter at the top to let Jekyll know to render the .txt file as markdown.

---
layout: null
sitemap: false
---
Sitemap: https://nithinbekal.com/sitemap.xml

Now you can disallow any bot by adding a rule like this one, which blocks the ChatGPT bot:

User-agent: GPTBot
Disallow: /

Neil Clarke has an excellent post with a comprehensive list of AI bots to block. You can also look at this site’s robots.txt.

Hi, I’m Nithin! This is my blog about programming. Ruby is my programming language of choice and the topic of most of my articles here, but I occasionally also write about Elixir, and sometimes about the books I read. You can use the atom feed if you wish to subscribe to this blog or follow me on Mastodon.