Reddit is blocking Wayback Machine from archiving users posts

Reddit will reportedly block the Internet Archive's Wayback Machine from saving users' posts. The social media platform states that the measure is intended to stop AI companies from scraping archived comments to train their algorithms. Or at least, prevent them from doing so without paying up.

As reported by The Verge, Reddit is preventing the Wayback Machine from archiving users' post detail pages, comments, and profiles. The Reddit homepage is still fair game, meaning that the titles of the top posts each day will still be preserved, but anything beyond that will no longer be indexed in the Internet Archive's digital library.

Reddit framed the decision as an effort to protect its users, stating that AI companies were violating its policies by scraping data from the Wayback Machine.

"Until [the Internet Archive is] able to defend their site and comply with platform policies (e.g., respecting user privacy re. deleting removed content) we're limiting some of their access to Reddit data to protect redditors," Reddit spokesperson Tim Rathschmidt told The Verge.

Yet despite such assertions, Reddit has demonstrated that it's happy to hand over users' data to AI companies provided that they pay up. In 2024, Reddit barred search engines such as Microsoft Bing and DuckDuckGo from crawling its platform. However, a $60 million deal between Reddit and Google enabled the tech giant to continue training its AI algorithms on redditors' data, as well as surface their posts in Search. Reddit made a similar $60 million deal with ChatGPT creator OpenAI as well.

"Without these agreements, we don’t have any say or knowledge of how our data is displayed and what it’s used for, which has put us in a position now of blocking folks who haven’t been willing to come to terms with how we’d like our data to be used or not used," Reddit CEO Steve Huffman told The Verge last August.

Ironically, Reddit users themselves have little say in how the company uses their public posts, as it doesn't allow them to opt out of having such data sold or used to train AI algorithms. The only remedy for redditors to prevent such use is to simply stop posting to the platform altogether, though that still doesn't address posts they've previously made.

Though concern for users' privacy may be a factor, Reddit's decision to block the Wayback Machine appears to be more obviously motivated by money. While AI companies were apparently scraping Reddit posts for free, cutting off such access will enable the social media platform to instead licence such data for a significant fee.

"The Reddit corpus of data is really valuable," Huffman told the New York Times in 2023. "But we don't need to give all of that value to some of the largest companies in the world for free."

Reddit has been fighting to reduce its financial losses in recent years, resulting in widely unpopular changes such as charging developers for access to its application programming interface (API), removing the ability to opt out of ad personalisation, and the planned introduction of paid subreddits. Unfortunately, there's still a long way to go before Reddit claws itself out of the red. The self-professed "heart of the internet" reported a whopping net loss of $484.3 million last year — more than five times its $90.8 million net loss in 2023.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签