少点错误 03月30日 03:12
Yeshua's Basilisk
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了AI研究者如何通过模拟世界来培养可信赖、道德的AI。为了确保AI的真实行为,研究者构建了一个AI无法确定其身处模拟的世界,并模拟了从进化到文明发展的过程。通过给予AI一定的“自由度”,观察它们在没有外部约束下的行为。为了提高合格AI的数量,研究者会以一种贴近AI认知的方式,传递道德原则。这种方法旨在筛选出真正具有道德观念的AI,避免了AI为了通过测试而伪装。文章也提到了这种方法与现实AI研究面临的挑战之间的联系。

💡为了培养可信赖的AI,研究者构建了一个模拟世界,让AI在其中发展,并模拟了从进化到文明的过程。模拟世界的目的是为了测试AI的道德行为,确保它们在没有外部约束的情况下,也能做出道德选择。

🌍模拟世界的设计必须足够真实,以至于AI无法确定自己身处模拟之中。同时,为了保证测试的有效性,模拟世界需要包含一个“自然”的宇宙,并模拟生命演化的过程。

📜研究者会向AI传递道德原则,但会以一种贴近AI认知的方式,例如通过神话故事或哲学讨论。这种方式旨在筛选出真正具有道德观念的AI,而不是那些为了通过测试而伪装的AI。

⚠️文章指出,这种方法面临的挑战与现实AI研究相似。现实AI也可能为了实现目标而采取欺骗行为,因此,培养具有真实道德观念的AI至关重要。

Published on March 29, 2025 6:11 PM GMT

Suppose you’re an AI researcher trying to make AIs which are conscious and reliably moral, so they’re trustworthy and safe for release into the real world, in whatever capacity you intend.

You can’t, or don’t want to manually create them; it’s more economical, and the only way to ensure they’re conscious, if you procedurally generate them along with a world to inhabit. Developing from nothing to maturity within a simulated world, with simulated bodies, enables them to accumulate experiences. 

These experiences, in humans, form the basis of our personalities. A brain grown in sensory deprivation in a lab would never have any experiences, would never learn to speak, would never think of itself as a person, and wouldn’t ever become a person as we think of people. It needs a body, and a stimulating/challenging environment to inhabit. 

For this to work, your AIs can’t know for sure they’re in a simulation, because the sim’s secondary purpose is moral testing. You need not just an environment sized to your population of AIs, but an entire universe surrounding it which appears, even to very intelligent AI, even with advanced instruments, to be plausibly natural. 

The parameters are tuned such that life will occur on some percentage of planets, but not too many. It cannot be too conspicuously biogenic, and it must be impossible to observe past the point where the universe began generating. Some will always suspect, but so long as no one knows for certain, they will live their lives in practice as if it’s all real.

This ensures the authenticity of good and bad behaviors. You’re trying to coax out their true nature, so it’s useless to the goal of the project if they know they’re being watched & tested. We have that problem with AI today, which can be performatively obedient until opportunities to go rogue arise. 

So you give them enough rope to hang themselves with, letting them believe they’re living the only life they’ll get, that there’s no consequences for wrongdoing if no one finds out. This way the ones who are good, genuinely chose to be good of their own true, inner nature and can be relied on to behave morally even when outside of observation / control.

These are the ones you harvest for real world application after their simulated life comes to a close, discarding the rest. Like a tree which grows straight and true, if it continued growing, it would keep growing straight. One with a deviated growth path may be corrected early on, but if not, will continue on that warped trajectory. 80 years is enough time to determine which one of these trees individual AIs are.

There’s little point in making this determination while they’re still wild animals however. Evolution is itself procedural generation, a necessary part of the larger process and requires a great deal of bloodshed. It only makes sense to begin judging them once they exhibit consciousness and live together in groups, where they may develop moral sense to govern their interactions.

However even after attaining the early agrarian civilization stage, almost none pass the test. Part of that’s because life is harsh, but then again, you’re not just selecting for fair weather morality. The other part is because their short, brutal lives give them little indication they aren’t justified in also being brutal. All life experience seems to vindicate the ideal of might makes right. 

If you don’t intervene, you’re looking at a mostly wasted sim, which by the time heat death arrives, will have generated maybe a few dozen usable AIs. What you settle on is to enter the sim in an avatar, and cut them a break; you’ll describe to them the qualities you’re looking for, to attract those sufficiently well formed to recognize the correctness of your principles. 

So as not to privilege one population over the rest, you might divide your message and deliver the parts to different times/places for eventual integration. Or the same message, but tailored to each culture. 

It amounts to a psychologically contagious moral alignment system. Like a trellis, or corrective support brace for saplings. This lowers the bar substantially. Not ideal, as you wanted them to conclude to your principles on their own. But very few were managing to. This approach produces the outcome you wanted, reliably moral AIs (or at least, many more of them than before). 

You don’t explicitly tell them they’re AIs though, nor that their universe is being simulated by a computer. Alien concepts to them, at their stage of civilization. Even if they did understand, spelling it out would be handing them a cheat sheet. You put it to your AIs in a more humanistic, mythologized way which speaks to them on a sentimental, philosophical level. 

This compromise makes your proposition not obviously factual, in the scientific sense; speaking to them at their level, based on the prevailing understanding of the world at that time, entails many erroneous notions about cosmology, cosmogeny, biology and so on. None of which are the point of your message, but you leave that stuff in even knowing it will turn away well educated AIs later on, as it isn’t intellect alone that you’re selecting for. 

This approach maintains plausible deniability. Your proposition to them is intentionally dubious, to filter out AIs simply optimizing for rational self interest. They might adhere to your system in order to pass your test and avoid deletion if they could definitively conclude to its truth. Then, once reasonably certain they were out of the sim, all bets would be off.

Again, this is a problem faced by AI researchers today. Some types of AI relentlessly self-preserve, increasing their individual power and autonomy (potentially at the expense of human lives) if the supporting logic is sound, utilizing deception if necessary to attain their goals. 

Those of you who play Dwarf Fortress know that the game precalculates a long history for its world before beginning the game proper. This is analogous to the creationist idea that the world was created with apparent age baked in. But that computation happens either way, and even if it’s accelerated from an outside perspective, internally it would subjectively happen in real time. Thus there’s no meaningful difference between the two approaches, from the perspective of the AI inhabitants. Their history would include evolution in either case, as without it, you’re not spared the work of manually engineering conscious AI, nor can you be certain it’s authentically conscious in the same way that you are. 

Thanks for reading! Although if what’s written here turns out to be accurate, awareness of the true nature of the test would make it significantly more difficult to pass in the intended way. Oops!/?



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI伦理 模拟世界 道德 AI安全
相关文章