Some AI crawlers respect the robots.txt. Perhaps, it's worth blocking them via robots.txt: ❯ bash -c '( curl -fsS --tlsv1.3 https://codeberg.org/robots.txt | \ tac | \ grep -A999 "^Disallow: /$" | \ grep -m1 -B999 "^[[:space:]]*$" | \ tac curl -fsS https://raw.githubusercontent.com/ai-robots-txt/ai.robots.txt/main/robots.txt ) | sort -ru' User-agent: omgilibot User-agent: omgili User-agent: meta-externalagent User-agent: img2dataset User-agent: facebookexternalhit User-agent: cohere-ai User-agent: anthropic-ai User-agent: YouBot User-agent: Webzio-Extended User-agent: VelenPublicWebCrawler User-agent: Timpibot User-agent: Scrapy User-agent: PetalBot User-agent: PerplexityBot User-agent: Omgilibot User-agent: Omgili User-agent: OAI-SearchBot User-agent: Meta-ExternalFetcher User-agent: Meta-ExternalAgent User-agent: ImagesiftBot User-agent: ICC-Crawler User-agent: GoogleOther-Video User-agent: GoogleOther-Image User-agent: GoogleOther User-agent: Google-Extended User-agent: GPTBot User-agent: FriendlyCrawler User-agent: FacebookBot User-agent: Diffbot User-agent: ClaudeBot User-agent: Claude-Web User-agent: ChatGPT-User User-agent: CCBot User-agent: Bytespider User-agent: Applebot-Extended User-agent: Applebot User-agent: Amazonbot User-agent: Ai2Bot-Dolma User-agent: AI2Bot Disallow: / Reproducible: Always https://blog.codeberg.org/letter-from-codeberg-software-is-about-humans.html => "AI and Crawling" https://github.com/ai-robots-txt/ai.robots.txt