Reddit Tries to Block Bots, Web Crawlers to Stop Unlicensed AI Data Scraping

From PC Mag: Reddit is updating its Robots Exclusion Protocol, or robots.txt file, to try to block bots and web crawlers from swiping data and content from its site.

Reddit says "good faith actors" like the Internet Archive will continue to have access to its platform, however, and adds that most Reddit users won't be affected by or notice the change. Reddit will also continue its practice of rate-limiting, which may help prevent third-party scraping.

This isn't an ironclad solution; as Google notes, there are loopholes to evade robots.txt rules.

"The instructions in robots.txt files cannot enforce crawler behavior to your site; it's up to the crawler to obey them," Google states. "While Googlebot and other respectable web crawlers obey the instructions in a robots.txt file, other crawlers might not."

View: Full Article