Mashable 6小时前
Reddit is blocking Wayback Machine from archiving users posts
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Reddit近期宣布将阻止互联网档案库Wayback Machine存档用户帖子,此举旨在阻止AI公司免费抓取其数据用于训练算法。Reddit表示,这是为了保护用户隐私,防止数据被滥用。然而,Reddit此前已与Google和OpenAI达成数千万美元的协议,允许这些公司使用其用户数据训练AI模型。此举引发了关于Reddit将用户数据商业化以及用户对其数据使用缺乏控制权的讨论。Reddit此举也被视为其在持续的财务亏损下,寻求新的收入来源的策略。

🗄️ Reddit限制Wayback Machine存档功能:Reddit将阻止Wayback Machine存档用户的帖子、评论和个人资料,仅允许存档首页信息。此举旨在防止AI公司通过Wayback Machine免费抓取Reddit数据以训练其算法。

💰 数据商业化与盈利动机:Reddit此举与近期其与Google和OpenAI达成的数千万美元协议相呼应,这些协议允许AI公司使用Reddit用户数据进行训练。这表明Reddit更倾向于通过授权数据使用权来盈利,而非允许免费抓取。

🔒 用户隐私与数据控制权争议:Reddit声称此举是为了保护用户隐私,但用户实际上对其数据如何被公司使用(包括出售或用于AI训练)几乎没有控制权,也无法选择退出。唯一的选择是停止发帖,但过往帖子的数据仍可能被使用。

📉 财务压力下的战略调整:Reddit近年来面临巨大的财务压力,净亏损高达数亿美元。限制Wayback Machine存档以及此前推出的API收费、取消广告个性化选项、计划推出付费板块等措施,都反映了其为扭转亏损局面而采取的商业化策略。

Reddit will reportedly block the Internet Archive's Wayback Machine from saving users' posts. The social media platform states that the measure is intended to stop AI companies from scraping archived comments to train their algorithms. Or at least, prevent them from doing so without paying up.

As reported by The Verge, Reddit is preventing the Wayback Machine from archiving users' post detail pages, comments, and profiles. The Reddit homepage is still fair game, meaning that the titles of the top posts each day will still be preserved, but anything beyond that will no longer be indexed in the Internet Archive's digital library.

Reddit framed the decision as an effort to protect its users, stating that AI companies were violating its policies by scraping data from the Wayback Machine. 

"Until [the Internet Archive is] able to defend their site and comply with platform policies (e.g., respecting user privacy re. deleting removed content) we're limiting some of their access to Reddit data to protect redditors," Reddit spokesperson Tim Rathschmidt told The Verge.

Yet despite such assertions, Reddit has demonstrated that it's happy to hand over users' data to AI companies provided that they pay up. In 2024, Reddit barred search engines such as Microsoft Bing and DuckDuckGo from crawling its platform. However, a $60 million deal between Reddit and Google enabled the tech giant to continue training its AI algorithms on redditors' data, as well as surface their posts in Search. Reddit made a similar $60 million deal with ChatGPT creator OpenAI as well.

"Without these agreements, we don’t have any say or knowledge of how our data is displayed and what it’s used for, which has put us in a position now of blocking folks who haven’t been willing to come to terms with how we’d like our data to be used or not used," Reddit CEO Steve Huffman told The Verge last August.

Ironically, Reddit users themselves have little say in how the company uses their public posts, as it doesn't allow them to opt out of having such data sold or used to train AI algorithms. The only remedy for redditors to prevent such use is to simply stop posting to the platform altogether, though that still doesn't address posts they've previously made.

Though concern for users' privacy may be a factor, Reddit's decision to block the Wayback Machine appears to be more obviously motivated by money. While AI companies were apparently scraping Reddit posts for free, cutting off such access will enable the social media platform to instead licence such data for a significant fee.

"The Reddit corpus of data is really valuable," Huffman told the New York Times in 2023. "But we don't need to give all of that value to some of the largest companies in the world for free."

Reddit has been fighting to reduce its financial losses in recent years, resulting in widely unpopular changes such as charging developers for access to its application programming interface (API), removing the ability to opt out of ad personalisation, and the planned introduction of paid subreddits. Unfortunately, there's still a long way to go before Reddit claws itself out of the red. The self-professed "heart of the internet" reported a whopping net loss of $484.3 million last year — more than five times its $90.8 million net loss in 2023.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Reddit Wayback Machine AI训练数据 数据隐私 平台盈利
相关文章