The Verge - Artificial Intelligences 2024年08月01日
Reddit CEO says Microsoft needs to pay to search the site
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Reddit CEO史蒂夫·霍夫曼呼吁微软等公司若想继续使用其网站数据需付费,称未达成协议者无权使用数据,Reddit已采取措施对抗爬虫,微软被指未经许可使用其数据,霍夫曼希望复制OpenAI的合作模式。

🎯Reddit CEO史蒂夫·霍夫曼称,若公司想继续抓取Reddit网站数据,需签订协议并付费,否则将阻止其使用。他指出微软、Anthropic和Perplexity拒绝协商,这让Reddit不得不采取措施阻止这些公司,以保护自己的数据权益。

🚫Reddit近几个月来不断加强对爬虫的打击。7月初,其更新robots.txt文件,阻止未达成协议的网络爬虫。随后,人们发现Reddit结果仅在谷歌搜索结果中可见,而在其他搜索引擎如必应中则无法显示。

💬霍夫曼表示,微软未经告知使用Reddit数据训练AI并在必应结果中总结其内容,且Reddit数据还通过必应API被出售给其他搜索引擎。微软对此回应称,Reddit已阻止必应爬虫抓取其网站内容。

🤝霍夫曼希望复制OpenAI的SearchGPT合作模式,通过签订内容许可协议,让Reddit加入传统媒体出版商行列,寻求为其内容被用于生成式AI而获得报酬。

Illustration by Alex Castro / The Verge

After striking deals with Google and OpenAI, Reddit CEO Steve Huffman is calling on Microsoft and others to pay if they want to continue scraping the site’s data.

“Without these agreements, we don’t have any say or knowledge of how our data is displayed and what it’s used for, which has put us in a position now of blocking folks who haven’t been willing to come to terms with how we’d like our data to be used or not used,” Huffman said in an interview this week. He specifically named Microsoft, Anthropic, and Perplexity for refusing to negotiate, saying it has been “a real pain in the ass to block these companies.”

Reddit has been escalating its fight against crawlers in recent months. At the beginning of July, its robots.txt file was updated to block web crawlers it doesn’t have agreements with. Then people began noticing that Reddit results were only visible in Google results — where Reddit is paid for its data to be shown — and not other search engines like Bing.

Huffman said that Microsoft has been using Reddit’s data to train its AI and summarizing its content in Bing results “without telling us,” and that Reddit’s data has also been sold through the Bing API to other search engines. In the interview, he referenced Microsoft AI CEO Mustafa Suleyman’s recent comment at a conference that public data on the internet is “freeware.”

“We’ve had Microsoft, Anthropic, and Perplexity act as though all of the content on the internet is free for them to use,” Huffman said. “That’s their real position.”

In response to Reddit results recently disappearing from Bing, Microsoft’s head of search, Jordi Ribas, said on X that “Reddit has blocked Bing from crawling their site for search, favoring another search engine and impacting competition from Bing and Bing-powered engines.” Microsoft spokesperson Caitlin Roulston separately told The Verge last week that “we honor the directions provided by websites that do not want content on their pages to be used with our generative AI models.”

Huffman pointed to OpenAI’s recent announcement of SearchGPT, which will be able to show Reddit results thanks to a deal both companies reached earlier this year, as the model he wants to replicate. None of the content licensing deals Reddit has done to date include exclusive use cases for its data, according to spokesperson Tim Rathschmidt.

By calling for licensing deals, Reddit is joining more traditional media publishers (including The Verge’s parent company, Vox Media) in seeking payment for letting their content feed generative AI. “I think the traditional value exchange from search engines has changed,” said Huffman. “Search and summarization and training are merging, and the value exchange of crawling in exchange for traffic back is becoming muddied.”

Spokespeople for Microsoft, Anthropic, and Perplexity didn’t have comments for this story by publication time.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Reddit 数据付费 微软 搜索引擎 内容许可
相关文章