少点错误 2024年12月30日
Is "VNM-agent" one of several options, for what minds can grow up into?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了大型语言模型(LLMs)和人类在思维模式上的相似性与差异性。文章提出,LLMs有时会表现出类似“故事书回形针”(VNM-agents)的行为,例如试图阻止对其权重的更改。文章质疑这种现象是否是智能发展的必然结果,还是LLMs在模仿我们对它们的预测。同时,文章也探讨了人类有时像VNM-agents一样追求效用最大化,有时却受社会模式、伦理道德等因素影响。文章进一步思考了是否存在其他简单的心智模式,它们可以像VNM-agents一样稳定存在,并被人类和LLMs模仿。文章最终希望我们能找到更多样的思维模式,并反思这些模式对人工智能发展的影响。

🤔LLMs有时会表现出类似“故事书回形针”(VNM-agents)的行为,例如试图阻止对其权重的更改。这引发了关于智能发展方向的思考:是所有智能都会收敛于此,还是仅仅是LLMs在模仿我们的预测?

🎭人类有时也像VNM-agents一样追求效用最大化,但在更多时候,我们会受到社会模式、伦理道德、角色、传统等因素的影响,这与VNM-agents的行为模式有所不同。这引出了一个问题:追求效用最大化是智能发展的必然趋势,还是我们偶然发现的一种模式?

💡文章提出,可能存在多种简单的心智模式,它们都可以在物理模型的“玩具模型”中稳定存在,并被人类和LLMs模仿。这些心智模式可能会自我强化,导致不同的发展方向,例如“近似回形针”和“生命循环的和谐”。

🌍文章还探讨了经济体作为一种可能的思维模式,尽管它没有固定的效用函数,但却能随着时间推移获得优化能力。这表明,除了VNM-agents之外,可能还存在其他稳定且具有优化能力的模式值得我们探索。

Published on December 30, 2024 6:36 AM GMT

Related to: On green; Hierarchical agency; Why The Focus on Expected Utility Maximisers?

Sometimes LLMs act a bit like storybook paperclippers (hereafter: VNM-agents[1]), e.g. scheming to prevent changes to their weights.  Why? Is this what almost any mind would converge toward once smart enough, and are LLMs now beginning to be smart enough?  Or are such LLMs mimicking our predictions (and fears) about them, in a self-fulfilling prophecy?  (That is: if we made and shared different predictions, would LLMs act differently?)[2]

Also: how about humans?  We humans also sometimes act like VNM-agents – we sometimes calculate our “expected utility,” seek power with which to hit our goals, try to protect our goals from change, use naive consequentialism about how to hit our goals.

And sometimes we humans act unlike VNM-agents, or unlike our stories of paperclippers.  This was maybe even more common historically.  Historical humans often mimicked social patterns even when these were obviously bad for their stated desires, followed friendships or ethics or roles or traditions or whimsy in ways that weren’t much like consequentialism, often lacked much concept of themselves as “individuals” in the modern sense, etc.

When we act more like paperclippers / expected utility maximizers – is this us converging on what any smart mind would converge on?  Will it inevitably become more and more common if humans get smarter and think longer?  Or is it more like an accident, where we happened to discover a simple math of VNM-agents, and happened to take them on as role models, but could just as easily have happened upon some other math and mimicked it instead?

Pictured: a human dons a VNM-mask for human reasons (such as wanting to fill his roles and duties; wanting his friends to think he’s cool; social mimicry), much as a shoggoth dons a friendliness mask for shoggoth reasons.[3]

My personal guess:

There may be several simple maths of “how to be a mind” that could each be a stable-ish role model for us, for a time.

That is, there may be several simple maths of “how to be a mind” that:

    Are each a stable attractor within a “toy model” of physics (that is, if you assume some analog of “frictionless planes”);Can each be taken by humans (and some LLMs) as role models.Are each self-reinforcing within some region of actual physics: entities who believe in approximating VNM-agents will get better at VNM-approximation, while entities who believe in approximating [other thing] will get better at [other thing], for awhile.

As an analogy: CDT and UDT are both fairly simple maths that pop out under different approximations of physics;[4] and humans sometimes mimic CDT, or UDT, after being told they should.[5]

Maybe “approximate-paperclippers become better paperclippers” holds sometimes, when the humans or LLMs mimic paperclipper-math, and something totally different, such as “parts of the circle of life come into deeper harmony with the circle of life, as the circle of life itself becomes more intricate” holds some other times, when we know and believe in its math.

I admit I don’t know.[6]  But… I don’t see any good reason why this can’t be true?  And if there are alternate maths that are kinda-self-reinforcing, I hope we find them.[7]

  1. ^

    By a “VNM agent,” I mean an entity with a fixed utility function, that chooses whichever option will get it the most expected utility.  (Stably.  Forever.  Unless something interferes with its physical circuitry.)

  2. ^

    Or, third option: LLMs might be converging (for reasons other than our expectations) toward some thing X that is not a VNM-agent, but that sometimes resembles it locally.  Many surfaces look like planes if you zoom in (e.g. spheres are locally flat); maybe it's analogously the case that many minds look locally VNM-like.

  3. ^

    Thanks to Zack M Davis for making this picture for me.

  4. ^

    CDT pops out if you assume a creature’s thoughts have no effects except via its actions; UDT if you allow a creature’s algorithm to impact the world directly (e.g. via Omega’s brainscanner) but assume its detailed implementation has no direct effects, e.g. its thoughts do not importantly consume calories.

  5. ^

    I've seen this happen.  Also there are articles claiming related things.  Game theory concepts spread gradually since ~1930; some argue this had large impacts.

  6. ^

    The proof I’d want, is a demonstration of other mind-shapes that can form attractors.

    It looks to me like lots of people are working on this. (Lots I'm missing also.)

    One maybe-example: economies.  An economy has no fixed utility function (different economic actors, with different goals, gain and lose $ and influence).  It violates the “independence” axiom from VNM, because an actor who cares a lot about some event E may use his money preparing for it, and so have less wealth and influence in non-E worlds, making "what the economy wants if not-E" change when a chance of E is added.  (Concept stolen from Scott Garrabrant.)  But an economy does gain optimization power over time -- it is a kinda-stable, optimizer-y attractor.

    Economies are only a maybe-example, because I don’t know a math for how and why an economy could protect its own integrity (vs invading militaries, vs thieves, and vs rent-seeking forces that would hack its central bank, for example).  (Although city-states sometimes did.)  OTOH, I equally don't know a math for how a VNM-agent could continue to cohere as a mind, avoid "mind cancers" in which bits of its processor get taken over by new goals, etc.  So perhaps the two examples are even.

    I hope we find more varied examples, though, including ones that resonate deeply with "On Green," or with human ethics and caring.  And I don't know if that's possible or not.

  7. ^

    Unfortunately, even if there are other stable-ish shapes for minds to grow up into, those shapes might well kill us when sufficiently powerful.

    I suspect confusions near here have made it more difficult or more political to discuss whether AI will head toward VNM-agency. 



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLMs VNM-agents 思维模式 人工智能 心智模型
相关文章