The Verge - Artificial Intelligences 2024年07月03日
Perplexity’s grand theft AI
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Perplexity 是一款旨在成为 Google 搜索竞争对手的 AI 搜索引擎,它通过抓取高质量来源并提供直接答案,而不是将用户引导至原始来源,来提供搜索服务。然而,这种行为引发了关于版权和道德的争议,因为 Perplexity 通过抓取付费内容并将其整合进自己的“报告”中,并使用第三方抓取工具绕过网站的 robots.txt 规则,以获取数据。这些行为引发了人们对 Perplexity 的道德和法律问题的质疑。

🤔 Perplexity 的核心价值主张是通过抓取高质量来源并提供直接答案来提供搜索服务,而不是将用户引导至原始来源。然而,这种行为引发了关于版权和道德的争议。

🤨 Perplexity 被指控通过抓取付费内容并将其整合进自己的“报告”中,并使用第三方抓取工具绕过网站的 robots.txt 规则,以获取数据。

🧐 Perplexity 的 CEO Aravind Srinivas 承认使用第三方抓取工具,但拒绝透露其名称,也拒绝承诺要求该工具停止违反 robots.txt 规则。这一行为表明 Perplexity 可能会继续依赖第三方抓取工具来获取数据,而不会考虑其行为的道德和法律后果。

🤯 这些行为引发了人们对 Perplexity 的道德和法律问题的质疑,并引发了关于 AI 公司在获取数据时是否应该遵守道德和法律规则的讨论。

😟 Perplexity 的行为也表明了 AI 公司在追求自身利益时可能采取的非道德行为,以及这些行为可能会对互联网生态系统造成的影响。

😔 Perplexity 的行为也提醒人们,在享受 AI 带来的便利的同时,也需要关注其带来的潜在风险,并加强对 AI 公司的监管,以确保其行为符合道德和法律规范。

😥 Perplexity 的行为也引发了人们对 AI 公司是否应该承担更多社会责任的思考,以及如何平衡 AI 技术发展与社会价值之间的关系。

What, exactly, is Perplexity’s innovation? | Image: The Verge

In every hype cycle, certain patterns of deceit emerge. In the last crypto boom, it was “ponzinomics” and “rug pulls.” In self-driving cars, it was “just five years away!” In AI, it’s seeing just how much unethical shit you can get away with.

Perplexity, which is in ongoing talks to raise hundreds of millions of dollars, is trying to create a Google Search competitor. Perplexity isn’t trying to create a “search engine,” though — it wants to create an “answer engine.” The idea is that instead of combing through a bunch of results to answer your own question with a primary source, you’ll simply get an answer Perplexity has found for you. “Factfulness and accuracy is what we care about,” Perplexity CEO Aravind Srinivas told The Verge.

That means that Perplexity is basically a rent-seeking middleman on high-quality sources. The value proposition on search, originally, was that by scraping the work done by journalists and others, Google’s results sent traffic to those sources. But by providing an answer, rather than pointing people to click through to a primary source, these so-called “answer engines” starve the primary source of ad revenue — keeping that revenue for themselves. Perplexity is among a group of vampires that include Arc Search and Google itself.

But Perplexity has taken it a step further with its Pages product, which creates a summary “report” based on those primary sources. It’s not just quoting a sentence or two to directly answer a user’s question — it’s creating an entire aggregated article, and it’s accurate in the sense that it is actively plagiarizing the sources it uses.

Forbes discovered Perplexity was dodging the publication’s paywall in order to provide a summary of an investigation the publication did of former Google CEO Eric Schmidt’s drone company. Though Forbes has a metered paywall on some of its work, the premium work — like that investigation — is behind a hard paywall. Not only did Perplexity somehow dodge the paywall but it barely cited the original investigation and ganked the original art to use for its report. (For those keeping track at home, the art thing is copyright infringement.)

Aggregation is not a particularly new phenomenon — but the scale at which Perplexity can aggregate, along with the copyright violation of using the original art, is pretty, hmm, remarkable. In an attempt to calm everyone down, the company’s chief business officer went to Semafor to say Perplexity was developing revenue sharing plans with publications, and aw gee whiz, how come everyone was being so mean to a product still in development?

At this point, Wired jumped in, confirming a finding from Robb Knight: Perplexity’s scraping of Forbes’ work wasn’t an exception. In fact, Perplexity has been ignoring the robots.txt code that explicitly asks web crawlers not to scrape the page. Srinivas responded in Fast Company that actually, Perplexity wasn’t ignoring robots.txt; it was just using third-party scrapers that ignored it. Srinivas declined to name the third-party scraper and didn’t commit to asking that crawler to stop violating robots.txt.

“Someone else did it” is a fine argument for a five-year-old. And consider the response further. If Srinivas wanted to be ethical, he had some options here. Option one is to terminate the contract with the third-party scraper. Option two is to try to convince the scraper to honor robots.txt. Srinivas didn’t commit to either, and it seems to me, there’s a clear reason why. Even if Perplexity itself isn’t violating the code, it is reliant on someone else violating the code for its “answer engine” to work.

To add insult to injury, Perplexity plagiarized Wired’s article about it — even though Wired explicitly blocks Perplexity in its text file. The bulk of Wired’s article about the plagiarism is about legal remedies, but I’m interested in what’s going on here with robots.txt. It’s a good-faith agreement that has held up for decades now, and it’s falling apart thanks to unscrupulous AI companies — that’s right, Perplexity isn’t the only one — hoovering up just about anything that’s available in order to train their bullshit models. And remember how Srinivas said he was committed to “factfulness?” I’m not sure that’s true, either: Perplexity is now surfacing AI-generated results and actual misinformation, Forbes reports.

We’ve seen a lot of AI giants engage in questionably legal and arguably unethical practices in order to get the data they want. In order to prove the value of Perplexity to investors, Srinivas built a tool to scrape Twitter by pretending to be an academic researcher using API access for research. “I would call my [fake academic] projects just like Brin Rank and all these kinds of things,” Srinivas told Lex Fridman on the latter’s podcast. I assume “Brin Rank” is a reference to Google co-founder Sergey Brin; to my ear, Srinivas was bragging about how charming and clever his lie was.

I’m not the one who’s telling you the foundation of Perplexity is lying to dodge established principles that hold up the web. Its CEO is. That’s clarifying about the actual value proposition of “answer engines.” Perplexity cannot generate actual information on its own and relies instead on third parties whose policies it abuses. The “answer engine” was developed by people who feel free to lie whenever it is more convenient, and that preference is necessary for how Perplexity works.

So that’s Perplexity’s real innovation here: shattering the foundations of trust that built the internet. The question is if any of its users or investors care.

Correction June 27th: Removes erroneous reference to Axios — the interview in question was with Semafor.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI 搜索引擎 Perplexity 道德 版权 robots.txt
相关文章