MarkTechPost@AI 13小时前
Cloudflare vs Perplexity: The Battle Over AI Web Scraping Heats Up
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

云flare指控Perplexity AI无视网站禁止指令,使用伪装技术抓取数据,引发关于AI伦理、透明度和互联网商业模式的激烈争议。Perplexity AI否认指控,称其行为属于用户驱动而非自动化抓取。这场争端凸显了AI行业与内容创作者之间日益紧张的关系,以及互联网商业模式向付费数据转变的趋势。

🔍 Cloudflare指控Perplexity AI系统性地忽略网站的robots.txt禁令和直接阻止,使用伪装用户代理和轮换自治系统编号等复杂技术,偷偷抓取数十万个域的内容,每日产生数百万次请求,严重违反了网站所有者的意愿和行业的基本规范。

🤔 此争议发生在云flare推出“按爬取付费”市场的同时,该市场允许出版商对AI机器人访问收费,并默认阻止大多数爬虫。此举得到了《大西洋》、《BuzzFeed》等主要媒体的响应,超过250万个网站已明确禁止AI用于训练,凸显了内容创作者对AI抓取行为的担忧。

🛡️ Perplexity AI的回应包括否认指控,称截图显示“未访问任何内容”,并辩称其行为属于用户驱动的抓取,而非自动化爬取,试图将责任推给用户。他们还提到过去曾因抄袭等问题受到指责,并强调其难以界定自身的内容使用标准,暗示行业缺乏明确规则。

💰 争议背后反映了互联网商业模式的变化,从广告驱动转向访问付费。云flare的立场是保护出版商的商业模型,强制执行阻止信号,并对AI内容访问收费。而Perplexity AI则认为,AI代理人代表用户的行为不应与人类浏览区分开来,挑战了现有规则。

🌐 未来,透明度和合规性将成为AI公司发展的关键。随着内容创作者越来越倾向于与AI公司建立数据授权合作关系,而非依赖秘密抓取,AI行业将面临更严格的监管和更高的道德标准,这最终将重塑数字世界的基石。

Reading through Cloudflare’s detailed exposé and the extensive media coverage, the controversy surrounding Perplexity AI’s web scraping practices is deeper — and more polarizing — than it first appears. Cloudflare accuses Perplexity of systematically ignoring website blocks and masking its identity to scrape data from sites that have opted out, raising serious questions about ethics, transparency, and the future of the Internet’s business model.

What Cloudflare Observed

Cloudflare’s report and independent investigations show that Perplexity, an AI startup, allegedly crawls and scrapes content from websites that explicitly signal (through robots.txt and direct blocks) that AI tools are not welcome. The technical evidence includes changing user agents to impersonate browsers like Google Chrome on macOS and rotating Autonomous System Numbers (ASNs) — sophisticated tactics intended to evade detection and blocks. Cloudflare claims it detected this covert scraping across tens of thousands of domains, generating millions of requests daily, and fingerprinted the crawler using machine learning and other network signals.

Why the Accusations Matter

For decades, websites have used robots.txt as a “gentleman’s agreement” to tell bots what’s allowed. While illegal in very few jurisdictions, the norm among leaders like OpenAI and Anthropic is to respect these signals. Perplexity’s alleged approach undermines this unwritten contract, suggesting a willingness to bypass website owners’ wishes in pursuit of training data.

This issue exploded just as Cloudflare launched its new “Pay Per Crawl” marketplace, which lets publishers charge for AI bot access and blocks most crawlers by default. Major outlets — The Atlantic, BuzzFeed, Time Inc., and O’Reilly — have signed up, and over 2.5million websites now disallow AI training outright.

Perplexity Responds

Perplexity’s spokesperson dismissed Cloudflare’s blog post as little more than a “sales pitch,” claiming the screenshots “show that no content was accessed” and denying ownership of the bot in question. Perplexity later argued that much of what Cloudflare saw was user-driven fetching (an AI agent acting on direct user requests) rather than automated crawling — a key distinction in ongoing debates about what “scraping” really means. They also mentioned that similar incidents had happened before, notably accusations of plagiarism from outlets like Wired, and the company has struggled to define its own standards for content use.

Divided Reactions & Broader Implications

The Big Picture: The Internet’s Business Model Is Changing

Conclusion

Whether Perplexity is being singled out unfairly or genuinely violating web norms, this is a watershed moment. The era of “free data” for AI is ending. Ethics, economics, and new gatekeeping platforms like Cloudflare are pushing a shift toward paid data, greater accountability, and sustainable content partnerships. Unless AI companies adapt, they’ll face locked gates and a fragmented, paywalled Internet — and that ultimately reshapes the foundation of the digital world.


Check out the Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks

The post Cloudflare vs Perplexity: The Battle Over AI Web Scraping Heats Up appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

云flare Perplexity AI 数据抓取 AI伦理 互联网商业模式
相关文章