Contributor 2024年07月17日
AI is a mess and Apple is here to clean it up
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了人工智能(AI)技术在互联网内容获取方面的争议,指出一些AI公司无视网站的robots.txt设置,擅自抓取内容,并将其用于生成答案或模型训练,引发了对知识产权和伦理的担忧。文章还将AI与加密货币、NFT和区块链进行比较,认为它们都存在过度炒作和潜在风险。作者呼吁苹果等科技公司在与AI公司合作时,应秉持更高的道德标准,尊重网站的权利和用户隐私。

🍎 AI公司无视robots.txt设置,擅自抓取网站内容用于生成答案或模型训练,引发知识产权和伦理争议。例如,Perplexity AI抓取网站内容生成虚假答案,甚至可能构成诽谤。

💻 AI公司辩称其行为类似于互联网档案馆(Archive.org),但作者认为二者存在根本区别。互联网档案馆旨在保存互联网信息,而AI公司则是为了商业利益利用他人创作。

🤖 AI技术与加密货币、NFT和区块链存在相似性,都存在过度炒作和潜在风险,并可能被用于牟利。作者质疑Nvidia是否在背后操控这些技术的发展,以推动其硬件销售。

🍏 作者呼吁苹果等科技公司应在与AI公司合作时,秉持更高的道德标准,尊重网站的权利和用户隐私,而不是仅仅追随行业标准。

🌐 作者认为,AI技术发展需要平衡创新和伦理,避免过度依赖他人创作,并应尊重知识产权和用户隐私。

Macworld

The time has come! Yes, this week the Macalope will answer all questions about AI! All of them! Prepare to be dazzled and informed!

(Disclaimer: not a guarantee. Not all questions will be answered. Dazzling will only occur in a small number of instances. Information is fleeting. Void where prohibited.)

If you don’t think AI is a broken technology, just remember that right now an AI is harvesting this very column to regurgitate back to someone in an authoritative manner as fact. If that doesn’t convince you it’s not to be trusted, nothing will.

Or, maybe this will: “Perplexity Is a Bullshit Machine.”

The aptly named Perplexity is another AI startup, one that ignores the specific settings of websites that ask not to be scraped and then takes that input and uses it to provide made-up answers to questions. The Macalope is not a lawyer, but some of them seem possibly libelous.

In one case, the text it generated falsely claimed that Wired had reported that a specific police officer in California had committed a crime.

Wired, June 19, 2024

Possibly libelous to both the officer and Wired! It’s a two-fer!

VC Investor: “What does your system do?”

Startup: “It is a perpetual libel machine.”

VC Investor: “Here is a blank check.”

All over the web, site owners are rushing to add various AI scraping gizmos to their robots.txt list in a vain effort to not have their content used without their consent. Several AI companies are simply ignoring these settings and taking the information anyway. People have said, well, they’re just doing the same thing Archive.org does. Archive.org ignores robots.txt in an effort to index everything. How can you be fine with that and not fine with what LLM companies are doing?

Because they are fundamentally different products.

“I want to provide a free service that will benefit everyone and help promote transparency on the internet.”

Fan-tastic. You go right ahead.

“I want to become a billionaire by selling the modern equivalent of a Magic 8 Ball and I plan to do it by using your work to fuel my accumulation of wealth.”

Uh, no.

Some have also said, “Well, none of this is illegal, so they can do what they want.” The Macalope supposes so, but he’s still going to hate it. Is plagiarism illegal? No. But it’s still wrong. And it can get you fired because of it, which is what should happen to these AIs.

Fired into the sun, preferably.

IDG

It’s worth noting that this “open web” scraping is being used for different things. On the least egregious end, LLMs need a corpus of writing to learn how language works. This seems okay because we all use language. (This does get fuzzier, though when you ask an AI to write in a particular writer’s style.) Then there are the models such as those used in Apple’s Image Playground that need to learn how to draw. The Macalope’s opinion here is pretty subjective, but this feels a bit more like copying for some reason. Possibly because we are not all artists. Then, of course, there are the AI systems that, in response to a question, say “Oh, yeah, I read about that somewhere. Here’s an answer that may or may not be correct and I may or may not give attribution to the site I read it on.” That one’s definitely a problem.

If this whole gross spectrum of behavior seems familiar to you, it’s probably because AI shares a certain DNA with crypto, NFTs, and the blockchain in that they are all trendy, usually touted by people you wouldn’t want to be stuck in an elevator with and, in a weird coincidence, all happen to drive up both Nvidia’s stock price and worldwide temperature averages. The Macalope doesn’t consider himself someone prone to conspiracy theories, but he would not be surprised to find out years from now that Nvidia has been running a powerful psychological ops campaign that dreams up technologies that require its boards to run and then convince venture capital firms to invest in them.

Just sayin’.

Ideally, Apple wouldn’t be associating itself with these AI companies at all, but the Market has demanded it and at least it’s taking an arms-length approach when working with them. It is a sad fact that all Apple has to do to be one of the more ethical AI companies is simply honor sites’ robots.txt settings. But it’s not enough for the company to say you can opt-out now after it’s already availed itself of people’s hard work. Apple should not be following the “industry standard” practices here. It should be better than the industry.

iOS

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 AI伦理 数据抓取 robots.txt 知识产权
相关文章