AI News 2024年08月28日
Baidu restricts Google and Bing from scraping content for AI training
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

百度更新其类似维基百科的百度百科服务,限制谷歌和微软必应抓取其内容,此举措在AI时代凸显了数据的重要性。

🎯百度在其百度百科的robots.txt文件最新更新中,拒绝谷歌和必应的爬虫访问。此前,谷歌和必应曾被允许索引百度百科的部分内容,该平台有近3000万条目。

📈百度此举是在训练人工智能模型和应用对大型数据集需求增加的背景下进行的,其他公司也有类似保护在线内容的举动。

🌐在全球生成式AI开发者与内容出版商合作以获取高质量内容的背景下,百度限制访问凸显了数据在AI时代的重要性,更多公司可能会重新评估其数据共享政策。

Chinese internet search provider Baidu has updated its Wikipedia-like Baike service to prevent Google and Microsoft Bing from scraping its content.

This change was observed in the latest update to the Baidu Baike robots.txt file, which denies access to Googlebot and Bingbot crawlers.

According to the Wayback Machine, the change took place on August 8. Previously, Google and Bing search engines were allowed to index Baidu Baike’s central repository, which includes almost 30 million entries, although some target subdomains on the website were restricted.

This action by Baidu comes amid increasing demand for large datasets used in training artificial intelligence models and applications. It follows similar moves by other companies to protect their online content. In July, Reddit blocked various search engines, except Google, from indexing its posts and discussions. Google, like Reddit, has a financial agreement with Reddit for data access to train its AI services.

According to sources, in the past year, Microsoft considered restricting access to internet-search data for rival search engine operators; this was most relevant for those who used the data for chatbots and generative AI services.

Meanwhile, the Chinese Wikipedia, with its 1.43 million entries, remains available to search engine crawlers. A survey conducted by the South China Morning Post found that entries from Baidu Baike still appear on both Bing and Google searches. Perhaps the search engines continue to use older cached content.

Such a move is emerging against the background where developers of generative AI around the world are increasingly working with content publishers in a bid to access the highest-quality content for their projects. For instance, relatively recently, OpenAI signed an agreement with Time magazine to access the entire archive, dating back to the very first day of the magazine’s publication over a century ago. A similar partnership was inked with the Financial Times in April.

Baidu’s decision to restrict access to its Baidu Baike content for major search engines highlights the growing importance of data in the AI era. As companies invest heavily in AI development, the value of large, curated datasets has significantly increased. This has led to a shift in how online platforms manage access to their content, with many choosing to limit or monetise access to their data.

As the AI industry continues to evolve, it’s likely that more companies will reassess their data-sharing policies, potentially leading to further changes in how information is indexed and accessed across the internet.

(Photo by Kelli McClintock)

See also: Google advances mobile AI in Pixel 9 smartphones

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Baidu restricts Google and Bing from scraping content for AI training appeared first on AI News.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

百度 谷歌 必应 AI数据
相关文章