MIT Technology Review » Artificial Intelligence 前天 18:30
Cloudflare will now, by default, block AI bots from crawling its clients’ websites
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

互联网基础设施公司 Cloudflare 宣布将默认阻止 AI 机器人访问其托管的网站,并为客户提供手动允许或禁止这些 AI 机器人的能力。此外,Cloudflare 还将推出“按爬取付费”服务,允许客户在 AI 机器人抓取其网站内容时获得补偿。此举旨在解决 AI 系统抓取网络内容时,内容创作者难以获得报酬和署名权的问题,并促进 AI 公司与内容创作者之间的公平合作。Cloudflare 希望通过此举维护互联网的开放性,并让网站所有者更好地控制其内容的使用。

🤖 Cloudflare 默认阻止 AI 机器人访问网站,并提供手动控制选项,允许或禁止特定 AI 爬虫。

💰 Cloudflare 推出“按爬取付费”服务,为网站所有者提供在 AI 机器人抓取其内容时获得补偿的机会。

🤝 Cloudflare 旨在通过验证爬虫及其意图,促进 AI 公司与内容创作者之间的良好合作,解决内容创作者的版权和收益问题。

⚠️ 尽管如此,Cloudflare 也意识到这种限制可能会影响非商业用途,例如研究和网络存档服务。

The internet infrastructure company Cloudflare announced today that it will now default to blocking AI bots from visiting websites it hosts. Cloudflare will also give clients the ability to manually allow or ban these AI bots on a case-by-case basis, and it will introduce a so-called “pay-per-crawl” service that clients can use to receive compensation every time an AI bot wants to scoop up their website’s contents.

The bots in question are a type of web crawler, an algorithm that walks across the internet to digest and catalogue online information on each website. In the past, web crawlers were most commonly associated with gathering data for search engines, but developers now use them to gather data they need to build and use AI systems. 

However, such systems don’t provide the same opportunities for monetization and credit as search engines historically have. AI models draw from a great deal of data on the web to generate their outputs, but these data sources are often not credited, limiting the creators’ ability to make money from their work. Search engines that feature AI-generated answers may include links to original sources, but they may also reduce people’s interest in clicking through to other sites and could even usher in a “zero-click” future.

“Traditionally, the unspoken agreement was that a search engine could index your content, then they would show the relevant links to a particular query and send you traffic back to your website,” Will Allen, Cloudflare’s head of AI privacy, control, and media products, wrote in an email to MIT Technology Review. “That is fundamentally changing.”

Generally, creators and publishers want to decide how their content is used, how it’s associated with them, and how they are paid for it. Cloudflare claims its clients can now allow or disallow crawling for each stage of the AI life cycle (in particular, training, fine-tuning, and inference) and white-list specific verified crawlers. Clients can also set a rate for how much it will cost AI bots to crawl their website. 

In a press release from Cloudflare, media companies like the Associated Press and Time and forums like Quora and Stack Overflow voiced support for the move. “Community platforms that fuel LLMs should be compensated for their contributions so they can invest back in their communities,” Stack Overflow CEO Prashanth Chandrasekar said in the release.

Crawlers are supposed to obey a given website’s directions (provided through a robots.txt file) to determine whether they can crawl there, but some AI companies have been accused of ignoring these instructions. 

Cloudflare already has a bot verification system where AI web crawlers can tell websites who they work for and what they want to do. For these, Cloudflare hopes its system can facilitate good-faith negotiations between AI companies and website owners. For the less honest crawlers, Cloudflare plans to use its experience dealing with coordinated denial-of-service attacks from bots to stop them. 

“A web crawler that is going across the internet looking for the latest content is just another type of bot—so all of our work to understand traffic and network patterns for the clearly malicious bots helps us understand what a crawler is doing,” wrote Allen.

Cloudflare had already developed other ways to deter unwanted crawlers, like allowing websites to send them down a path of AI-generated fake web pages to waste their efforts. While this approach will still apply for the truly bad actors, the company says it hopes its new services can foster better relationships between AI companies and content producers. 

Some caution that a default ban on AI crawlers could interfere with noncommercial uses, like research. In addition to gathering data for AI systems and search engines, crawlers are also used by web archiving services, for example. 

“Not all AI systems compete with all web publishers. Not all AI systems are commercial,” says Shayne Longpre, a PhD candidate at the MIT Media Lab who works on data provenance. “Personal use and open research shouldn’t be sacrificed here.”

For its part, Cloudflare aims to protect internet openness by helping enable web publishers to make more sustainable deals with AI companies. “By verifying a crawler and its intent, a website owner has more granular control, which means they can leave it more open for the real humans if they’d like,” wrote Allen.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Cloudflare AI 爬虫 内容版权 互联网基础设施
相关文章