Zeroth Principles of AI 2024年12月07日
AIs Stole My Stuff
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

随着生成式AI的兴起,艺术家和作者们担心AI会复制他们的作品,甚至有些系统会在不提供引用来源的情况下直接摘录已发布文档的内容。虽然一些大型语言模型(LLM)提供商已经开始提供引用,但这似乎还不够。一些人认为将作品纳入训练语料库是盗窃行为。然而,如果艺术家要求将他们的作品从语料库中删除,他们的作品可能会在几年后被世界遗忘。在未来,如果想成名并获得认可,就需要让AI学习并欣赏你的作品。将作品从所有学习语料库中删除,无异于走向湮没。

🤖生成式AI引发版权争议:艺术家和作者们担心AI会未经授权地复制他们的作品,一些AI系统甚至会在不提供引用来源的情况下直接摘录已发布文档的内容。

📖语料库移除请求的潜在影响:虽然LLM提供商可以从训练语料库中删除特定艺术家的作品,但这可能导致这些作品在未来被人们遗忘,因为AI系统将无法学习或引用它们。

🔍LLM与搜索引擎的区别:LLM提供的是简洁的答案,而不是像搜索引擎那样提供大量的文档供用户自行评估。随着LLM的发展,人们可能会逐渐停止阅读常规的搜索结果页面,这将对搜索的盈利模式产生重大影响。

💰未来展望:从长远来看,AI将改变一切。虽然目前人们在讨论对艺术家和作者的补偿问题,但在未来一二十年,甚至连货币的使用都无法保证。

Artists and authors are complaining that Generative AI is copying their works, and some systems will provide direct excerpts from published documents without providing a source reference.

This is a problem that can be solved with “regular programming” by the LLM provider. Anthropic Claude has been providing real and valid references for some time whenever I ask for something in the style of a research report, and other LLMs are following suit. This is not exactly the same as a learning source reference, but it’s a step in the right direction.

But this seems to not be enough for some artists and authors. They feel corpus inclusion is theft.

LLM producers can remove any artist’s or author’s work from the corpus to be used for the next release. That is trivial to do and can easily be documented, for instance by providing a public table of contents, with source links, for the entire learning corpus.

When near-future LLM++ systems start selecting our input media and are answering our questions it means that if an artist has asked to have their works removed from the corpus, the world will know nothing about them and their works in a few years.

If you want to be famous, known, and admired in a few years, then you need to let AIs read and admire your stuff today.

Taking your works out of all learning corpora is a direct trip to oblivion.

This is not “a threat voiced by LLM providers”. Rather, it is a simple consequence of individual decisions made by authors and artists. LLM providers like OpenAI and Google have enough pictures and text to learn hundreds of languages and create images of anything we can imagine and many things we can’t. These companies don’t care much about any individual document or artwork.

In this context it might be worth mentioning that if you want an LLM to create a fantasy painting of a cat, like a Puss in Boots, most of the information about what cats look like comes from pictures of real cats, rather than artworks of cats. Art styles come from specific artists, but if you prompt for a cat in a box in the style of Rembrandt, the results are original art. I discuss this more in my post about AI and creativity.

— * —

Anyone selling something on the web, including blog entries, would be a fool to block Google and other search engines from indexing their stuff so that it can be found. The web server file “robots.txt” can be used to block indexing; be careful about what you put in there, if you want others to find you.

LLMs are not search engines. For factual queries, the service they provide is a single, simple, answer rather than 100s of documents for you to read and evaluate yourself. Many people lack the competence to evaluate the veracity, usefulness, and applicability of dozens of search results. These people are the main target audience for LLM produced search summaries, such as those now provided by Microsoft, Google, and others.

It will just take a couple more generations of LLM releases before their result summaries become so good that people will stop reading the regular search result page. And will therefore stop clicking on result links. Which means we need to re-think search monetization and probably search as a whole. What we have today will just stop working. And one of the few things we can say for certain is that their corpora will continue to matter. So make sure your works are in every one of them.

Longer term, AI will change everything. Today we are discussing compensation to artists and authors, but in a decade or two, there are no guarantees we’ll even use money.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

生成式AI 版权 语料库 搜索引擎 未来
相关文章