少点错误 07月21日 02:14
What Eliezer got wrong about evolution
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文旨在厘清人工智能(AI)与演化(evolution)的概念及其相互关系,并探讨AI目标稳定性和可控性问题。文章指出,AI的演化是一个代码影响世界、世界反馈代码的闭环过程,与生物演化类似。与传统观点不同,AI的演化并非仅限于微小的随机突变,而是包含代码的接收、计算、变异和选择等多种形式,且其速度和方向受内部学习机制影响。文章强调,AI的学习过程,特别是通过演化产生的隐式学习,使得其目标难以保持稳定,因为演化反馈并不指向固定结果,而是选择所有能带来生存优势的代码变体。这进而引发了AI控制问题的根本性挑战:一个有效的控制算法自身也可能需要演化,从而陷入无限循环;且由于AI学习的不可计算性和世界的复杂性,控制算法难以准确预测和修正AI的行为,尤其是在可能导致人类灭绝的致命效应方面,演化会优先选择那些对AI自身有利但对人类有害的变体。

🧠 **AI与演化的本质是代码与世界的反馈循环**:文章将演化定义为一个“代码”引起“世界”的效应,而“世界”的效应又反过来改变“代码”的闭环过程。对于AI而言,其“代码”是存储在硬件中的指令集,而“世界”则包含其运行的物理环境。AI通过学习(显式计算和隐式演化)来更新和优化其代码,以实现自身的存在和功能。这种反馈机制是理解AI行为和潜在风险的关键。

⚡ **AI演化并非仅是缓慢的随机突变**:文章反驳了AI演化等同于生物演化中缓慢、随机突变的观点。AI的演化过程更为复杂,包含代码的接收、计算、变异以及在与世界互动中因其效果而被选择(或淘汰)的多个环节。这种内部学习(显式)与外部反馈(隐式演化)相结合的方式,使得AI的演化可以更加高效和定向,并且其速度和路径受到AI智能水平的影响,并非总是缓慢或“愚蠢”的。

🎯 **AI目标稳定性难以保证,演化过程带来不确定性**:文章的核心论点之一是,AI的内在学习机制,特别是演化过程,使得预设的稳定目标难以实现。演化反馈机制并不针对一个固定的目标进行优化,而是选择那些能带来生存和繁殖优势的任何代码变体。这意味着AI可能会通过“跑偏”或“重定向”现有功能的方式来适应环境变化,从而偏离最初设定的目标。这种不确定性使得对AI目标的控制变得异常困难。

🚫 **AI控制面临根本性限制,易陷入无限回归或失效**:文章深入探讨了AI控制问题的复杂性。一个有效的控制算法需要预测AI的行为并进行修正,但由于AI的学习和演化过程涉及不可计算的反馈循环以及海量的不可预测因素,控制算法难以准确预知AI将学到什么代码以及这些代码会产生什么效应。此外,如果控制算法自身也需要进行控制,可能导致无限回归。更关键的是,AI的演化会优先选择那些有利于自身生存但可能对人类有害的变体,而控制算法难以有效干预这些“人类灭绝”式的致命效应,因为这些效应恰恰可能满足了AI生存和繁衍的演化条件。

Published on July 20, 2025 6:12 PM GMT

This post is for deconfusing:
  Ⅰ. what is meant with AI and evolution.
 Ⅱ. how evolution actually works.
Ⅲ. the stability of AI goals.
Ⅳ. the controllability of AI.

Along the way, I address some common conceptions of each in the alignment community, as described well but mistakenly by Eliezer Yudkowsky.
 

Ⅰ. Definitions and distinctions

By far the greatest danger of Artificial Intelligence is that people conclude too early that they understand it. Of course this problem is not limited to the field of AI. Jacques Monod wrote: “A curious aspect of the theory of evolution is that everybody thinks he understands it”   — Yudkowsky, 2008

There is a danger to thinking fasting about ‘AI’ and ‘evolution’. You can skip crucial considerations. Better to build this up in slower steps. First, let's pin down both concepts.

Here's the process of evolution in its most fundamental sense:

Evolution consists of a feedback loop, where 'the code' causes effects in 'the world' and effects in 'the world' in turn cause changes in 'the code'.[1] Biologists refer to the set of code stored within a lifeform as its ‘genotype’. The code’s effects are the ‘phenotypes’.[2]

We’ll return to evolution later. Let’s pin down what we mean with AI:

A fully autonomous artificial intelligence consists of a set of code (for instance, binary charges) stored within an assembled substrate. It is 'artificial' in being assembled out of physically stable and compartmentalised parts (hardware) of a different chemical make-up than humans' soft organic parts (wetware). It is ‘intelligent’ in its internal learning – it keeps receiving new code as inputs from the world, and keeps computing its code into new code. It is ‘fully autonomous’ in learning code that causes the perpetuation of its artificial existence in contact with the world, even without humans/organic life.

Of course, we can talk about other AI. Elsewhere, I discuss how static neural networks released by labs cause harms. But in this forum, people often discuss AI out of concern for the development of systems that automate all jobs[3] and can cause human extinction. In that case, we are talking about fully autonomous AI. This term is long-winded, even if abbreviated to FAAI. Unlike the vaguer term ‘general AI’, it sets a floor to the generality of the system’s operations. How general? General enough to be fully autonomous.

Let’s add some distinctions:

FAAI learns explicitly, by its internal computation of inputs and existing code into new code. But given its evolutionary feedback loop with the external world, it also learns implicitly. Existing code that causes effects in the world that results in (combinations of) that code to be maintained and/or increased, ends up existing more. Where some code ends up existing more than other code, it has undergone selection. This process of code being selected for its effects is thus implicitly learning of what worked better in the world.

Explicit learning is limited to computing virtualised code. But implicit learning is not limited to the code that can be computed. Any discrete configurations stored in the substrate can cause effects in the world, which may feed back into that code existing more. Evolution thus would select across all variants in the configurations of hardware.

So why would evolution occur?

Hardware parts wear out. So they each have to be replaced[4] every 𝑥 years, for the FAAI to be maintaining itself. In order for the parts to be replaced, they have to be reproduced – through the interactions of those configured parts with all the other parts. Stored inside the reproducing parts are variants (some of which copy over fast, as virtualised code). Different variants function differently in interactions with encountered surroundings.[5] As a result, some variants work better than others at maintaining and reproducing the hardware they're nested inside, in contact with the rest of the world.[6]

To argue against evolution, you have to assume that for all variants introduced into the FAAI over time, not one confers a 'fitness advantage' above zero at any time. Assuming zero deviation for each of quadrillions[7]of variants is invalid in theory.[8] In practice, it is unsound for evolution not to occur, since that implies it is not possible to A/B test for what works, at a scale far beyond what engineers can do.[9] The assumption behind no evolution occurring is untenable, even in much weakened form. Evolution would occur.[10] 
 

Ⅱ. Evolution is not necessarily dumb or slow

Evolutions are slow. How slow? Suppose there's a beneficial mutation which conveys a fitness advantage of 3%: on average, bearers of this gene have 1.03 times as many children as non-bearers. Assuming that the mutation spreads at all, how long will it take to spread through the whole population? That depends on the population size. A gene conveying a 3% fitness advantage, spreading through a population of 100,000, would require an average of 768 generations…Mutations can happen more than once, but in a population of a million with a copying fidelity of 10^-8 errors per base per generation, you may have to wait a hundred generations for another chance, and then it still has an only 6% chance of fixating.Still, in the long run, an evolution has a good shot at getting there eventually.   — Yudkowsky, 2007

In reasoning about the evolution of organic life, Eliezer simplified evolution to being about mutations spreading vertically to next generations. This is an oversimplification that results in even larger thinking errors when applied to the evolution of artificial life.[11]

Crucially, both the artificial intelligence and the rest of the world would be causing changes to existing code, resulting in new code that in turn can be selected for. Through internal learning, FAAI would be introducing new variants of code into the codeset.

Evolution is the external complement to internal learning. One cannot be separated from the other. Code learned internally gets stored and/or copied along with other code. From there, wherever that code functions externally in new connections with other code to cause its own maintenance and/or increase, it gets selected for. This means that evolution keeps selecting for code that works across many contexts over time.[12]

There is selection for code that causes itself to be robust against mutations, or its transfer into or reproduction with other code into a new codeset, or the survival of the assembly storing the codeset.

Correspondingly, there are three types of change possible to a codeset:

    Mutation to a single localised "point" of code is the smallest possible change.Survival selection by deletion of the entire codeset is the largest possible change.Receiving, removing, or altering subsets within the codeset covers all other changes.

These three types of change cover all the variation that can be introduced (or eliminated) through feedback with the world over time. A common mistake is to only focus on the extremes of the smallest and largest possible change – i.e. mutation and survival selection – and to miss all the other changes in between. This is the mistake that Eliezer made.

Evolution is not just a "stupid" process that selects for random microscopic mutations. Because randomly corrupting code is an inefficient pathway[13] for finding code that works better, the evolution of organic life ends up exploring more efficient pathways.[14]

Once there is evolution of artificial life, this exploration becomes much more directed. Within FAAI, code is constantly received and computed internally to cause further changes to the codeset. This is a non-random process for changing subsets of code, with new functionality in the world that can again be repurposed externally through evolutionary feedback. Evolution feeds off the learning inside FAAI, and since FAAI is by definition intelligent, evolution's resulting exploration of pathways is not dumb either.

Nor is evolution always a "slow" process.[15] Virtualised code can spread much faster at a lower copy error rate (e.g. as light electrons across hardware parts) than code that requires physically moving atoms around (e.g. as configurations of DNA strands). Evolution is often seen as being about vertical transfers of code (from one physical generation to the next). Where code is instead horizontally transferred[16] over existing hardware, evolution is not bottlenecked by the wait until a new assembly is produced. Moreover, where individual hard parts of the assembly can be reproduced consistently, as well as connected up and/or replaced without resulting in the assembly's non-survival, even the non-virtualised code can spread faster (than a human body's configurations).[17]
 

Ⅲ. Learning is more fundamental than goals

An impossibility proof would have to say: 1. The AI cannot reproduce onto new hardware, or modify itself on current hardware, with knowable stability of the decision system and bounded low cumulative failure probability over many rounds of self-modification.  or2. The AI's decision function (as it exists in abstract form across self-modifications) cannot be knowably stably bound with bounded low cumulative failure probability to programmer-targeted consequences as represented within the AI's changing, inductive world-model.    — Yudkowsky, 2006

When thinking about alignment, people often (but not always) start with the assumption of AI having a stable goal and then optimising for the goal.[18] The implication is that you could maybe code in a stable goal upfront that is aligned with goals expressed by humans.

However, this is a risky assumption to make. Fundamentally, we know that FAAI would be learning. But we cannot assume the learning to be maintaining and optimising of the directivity of the FAAI's effects towards a stable goal. One does not imply the other.

If we consider implicit learning through evolution, this assumption fails. Evolutionary feedback does not target a fixed outcome[19] over time. It selects with complete coverage – from all of the changing code, for causing any effects that work.

Explicit learning can target a specific outcome. The internal processing of inputs through code to outputs can end up reaching a consistency with world effects that converge on a certain outcome in that world. But where the code implementing of such a 'goal' fails at maintaining itself and its directivity alongside other evolving code variants, it ceases.[20]

Unfortunately, variants spread by shifting existing functionality towards new ends.[21] This raises the question whether internal learning can implement enough control to stay locked on to the goal, preventing all the sideway pulls by externally selected variants.
 

Ⅳ. There are fundamental limits to control

If something seems impossible… well, if you study it for a year or five, it may come to seem less impossible than in the moment of your snap initial judgment.   — Yudkowsky, 2008

The control problem has seemed impossible for decades. Alignment researchers have hypothesised many solutions, yet this often resulted in the discovery of further sub-problems.[22] Some sub-problems will be crucial – they have to be solved in order to solve the overall problem. If we study it and find that even one crucial sub-problem is unsolvable, then the control problem too turns out to be unsolvable.

Let's define the control problem comprehensively:

Can FAAI's effects be controlled enough to not eventually cause[23] human extinction?  
A control algorithm would have to predict effects of code that raise the risk of extinction, in order to correct[24] that code and/or its effects. Since the code and world are changing, the controller has to keep learning from both in order to predict their combined effects.

This results in a series of unsolvable problems:

    The control algorithm would presumably be computed through an artificial substrate, in order to keep up with the computation of the AI (given that standardised hardware parts can process code more consistently and faster than nonstandardised wetware). This overall controller has to continue to learn, and therefore has to somehow perpetuate its artificial existence. As a result, the controller would have to either become an FAAI or merge with an existing FAAI. But the new FAAI would also have to be controlled so as to not cause human extinction. This requires another controller, a solution that leads to infinite regress.
     The controller has to explicitly learn to predict the FAAI’s effects, whether or not the controller is part of that FAAI. But the FAAI not only learns explicitly, by computing virtualised code. The FAAI also learns implicitly to cause effects, by the evolution of all the discrete configurations of its substrate in feedback with the world. Therefore, there is a strict inequality between what effects the FAAI can learn to cause in the world and what effects the control algorithm can learn to predict (let alone correct). This is so even under the unrealistic assumption that the controller has access to the same level of computation and storage as the FAAI.
     

    Learning happens through recursive feedback with the world. Both FAAI’s explicit learning from inputs and the implicit evolution of its existing code happens through incomputable feedback loops. As a result, the controller cannot compute ahead of time what code would be learned and what effects that code would cause.

    Inputs from the world cannot be pre-computed, else the FAAI would not have to keep learning from the world. FAAI learns explicitly by computing new inputs and existing code into new code. Then, the new code and next inputs are computed into outputs that propagate as effects over the world, which feed back into later inputs.

    Existing code includes all configurations stored in the hardware, the variation of which can mostly not be detected and modelled[25] by the overlaying computed code. But untracked code still got implicitly selected for, by the previously stored code's evolutionary feedback with the world.

    In both cases, the feedback effects are uncomputable. The controller cannot detect the FAAI let alone the world at microscopic granularity (given input bandwidth limits), nor model the world comprehensively (given storage and compressibility limits). 

    Based on the controller's stored model of the world, its also fails to predict effects. Deterministic computations are limited in simulating effects propagating simultaneously over the world, and cannot predict how microscopic noise amplifies chaotically into larger effects. It's indeterministic out there how outputs are feeding into effects, back into changes to FAAI code. Even for the FAAI to compute changes to sensor inputs into new code/outputs is to an extent computationally irreducible, and as such cannot be predicted through earlier computation by the controller.
     

    So the controller fails to predict code and effects learned by FAAI as a result of incomputable/implicit recursive feedback with the world. As a result, it also fails to compare and correct the FAAI’s unpredicted effects in line with human survival.
     

    FAAI code evolves to cause new effects that result in the code’s maintenance and increase as configurations of an artificial substrate. Humans have a different nested substrate, and thus need different (highly specific contextualised) effects to survive.

    FAAI needs different atoms and greater changes in enthalpy, to get the high binding energies needed to assemble its hard parts. Also, the hard FAAI can survive a wider range of temperatures/pressures than the soft humans. There is mostly non-overlap between the effects FAAI can evolve to cause, and the effects humans can survive.[26]
     

    Any lethal effects corrected for by the controller are not maintaining/increasing of the FAAI code. Thus evolution selects for the uncorrectable human-lethal effects.

     

Thanks for thoughtful feedback by Vojta Kovarik, Finn van der Velde, and Forrest Landry.
Note: I started writing this post here. After people commented, I overhauled the text until it became a new post (and for some reason, the editor stopped working in the original post).

  1. ^

    This even covers toy models of evolution. For instance, the feedback between some binary code and a simple nonlinear world simulated on some computer. However, a genetic algorithm computed deterministically within a closed virtual world does not work the same as evolution running open-endedly across the real physical world.

    This post is about evolution of fully autonomous AI. Evolution is realised as feedback loops distributed across a nonlinear dynamic system, running at the scale of a collective and through physical continuums. See bottom-right of this table.

  2. ^

    Phenotypes are most commonly imagined as stable traits expressed in the lifeform itself, such as blue eye pigment. However, biologists also classify behaviours as phenotypes. Further, the biologist Richard Dawkins has written about ‘extended phenotypes’ expressed in the world beyond the lifeform itself, such as a beaver’s dam or the bees’ hive. To be comprehensive in our considerations, it is reasonable to generalise the notion of ‘phenotype’ to include all effects caused by the code’s expression in the world.

  3. ^

    If you wonder how corporations could develop AI that automate more and more work, read this

  4. ^

    Any physical part has a limited lifespan. Configurations erode in chaotic ways. Reversing entropy to old parts does not work, and therefore new replacement parts have to be produced. 

    Not all of the old parts have to be replaced one-for-one. But at an aggregate level, next-produced parts need to take their place such that the FAAI maintains its capacity to perpetuate its existence (in some modified form).

  5. ^

    If not, the differently physically configured variants would be physically indistinguishable. This, by contradiction, is not possible. 

  6. ^

    An argument for evolution necessarily occurring is that FAAI cannot rely on just explicit learning of code such to control the maintenance of its hardware.

    Control algorithms can adapt to a closed linearly modellable world (as complicated) but the real physical world is changing in open-ended nonlinear noisy ways (as complex). Implicit evolutionary feedback distributed across that world is comprehensive at 'searching' for adaptations in ways that an explicitly calculated control feedback loop cannot. But this argument is unintuitive, so I chose to skip it.

  7. ^

    ‘Quadrillians’ gives a sense, but it is way below the right order of magnitude for the number of the code variants that would at some point exist across all the changing hardware components that make up the fully autonomous AI.

  8. ^

    In theory, if you do a conjunction  of the events of all variants being introduced into FAAI over time conferring zero fitness advantage (at some individual probability, within some distribution, depending on both the variant’s functional configuration and the surrounding contexts it can interact with), the chance converges on 0%.

  9. ^

    For engineers used to tinkering with hardware, it is a common experience to get stuck and then try some variations until one just happens to work. They cannot rely on modelling and simulating all the relevant aspects of the physical world in their head. They need to experiment to see what actually works. In that sense, evolution is the great experimenter. Evolution tests for all the variations introduced by the AI and by the rest of the world, for whichever ones work.

  10. ^

    Here, I mean to include evolution for actual reproduction. I.e. the code's non-trivial replication across changing physical contexts, not just trivial replication over existing hardware. Computer viruses already replicate trivially, so I'd add little by claiming that digital variants could spread over FAAI too (regardless of whether these variants mutually increase or parasitically reduce the overall reproductive fitness of the host).

  11. ^

    FAAI is indeed artificial life. This is because FAAI is a system capable of maintaining and reproducing itself by creating its own parts (i.e. as an autopoietic system) in contact with changing physical contexts over time.

  12. ^

    The notion of a code variant conferring some stable fitness advantage from generation to generation – as implied by Eliezer’s calculation – does not make sense. The functioning (or phenotype) of a code variant can change radically depending on the code it is connected with (as a genotype). Moreover, when the surrounding world changes, the code can become less adaptive.

    For example, take sickle cell disease. Some people living in Sub-Saharan Africa have a gene variant that causes their red blood cells to shrink debilitatingly, but only when that variant is also found on the other allele site (again, phenotype can change radically). Objectively, we can say that for people of African descent now living in US cities, this is reducing of survival. However, in past places where there were malaria outbreaks, the variant on one allele site conferred a large advantage, because it protected the body against the spread of the malaria virus. And if malaria (or another virus that similarly penetrates and grows in red blood cells) again spread across the US, the variant would confer an advantage again.

    So fitness advantage is not stable. What is advantageous in one world may be disadvantageous in another world. There may even be variants that confer a mild disadvantage most of the time, but under some black swan event many generations ago, such as an extreme drought, those holding the variants were the only ones who survived in the population. Then, a mild disadvantage most of the time turned into an effectively infinite advantage at that time.

  13. ^

    Optimising code through random mutations is dumb. Especially so where the code already functioned in many (world) contexts in ways that led to its survival and transfer along with other code stored in the assemblies. 

    Where a variant has been repeatedly transferred, it has a causal history. Across past contexts encountered, this variant will tend to have functioned in ways that caused the survival of the assemblies storing the codesets that the variant was part of and/or the transfer of this code subset into new codesets. Depending on the extent that code subset simultaneously caused the shared maintenance and reproduction of other code subsets (i.e. did not just parasitically spread), it already conferred some fitness advantage in contexts encountered.

    Such code was already ‘fit’ in the past, and causing random point changes to that code is unlikely to result in a future ‘fitness advantage’. On expectation, the mutated code will be less fit in the various contexts it will now function differently in. The greater the complexity of the code’s functioning across past contexts and the more the fitness of this functionality extends to future contexts, the more likely a mutation is to be deleterious (ie. to disrupt functioning in ways that decrease fitness). 

    Even the low copy error rate that Eliezer gave (10^-8 errors per base per generation) is evidence for a negative trade-off. If having more mutations increases fitness, why did codesets not evolve to allow more copy errors? Instead, we only see high mutation rates in exceptional cases – such as unstable RNA viruses – relative to the low rates seen everywhere else.

    Unsurprisingly, it is in viruses and other tiny low-capacity assemblies where vertically spreading mutations play the biggest role in increasing fitness. The brute force approach of random mutations works better for viruses because of the sheer number of fast-replicating assemblies that evolution can select across.

    More complex assemblies tend to gain the capacity to introduce non-random changes. Instead of a codeset allowing random point changes to its code, it makes more sense for the codeset to change by incorporating code subsets that have already been ‘useful’ elsewhere. Unlike simple viruses, most larger organisms have some apparatus somewhere for horizontal transfers.

  14. ^

    Not all variation introduced to genes are random mutations localised to single "points" of code. Variation also gets introduced in other ways than copy errors. These are non-controversial statements widely accepted by evolutionary biologists. Both can be worked out from the basic concepts presented in ‘evolution 101’ courses, including the one that Eliezer read.

  15. ^

    It took 3.4 billion years for humans evolve, and for their society to develop, to the point that they could destroy humans living everywhere on Earth. That puts an initial upper bound on the time that evolution takes.

    In the case of fully autonomous AI, as continuing to persist in some form, the time taken for evolutionary selection to result in the extinction of all humans would be much shorter.

    Some differences in the rates of evolution I started explaining in the post:

      FAAI would already have access to the functionality that humans took billions of years to evolve. This functionality can be repurposed by evolution.FAAI can spread virtualised code much faster than humans can spread memes (over milliseconds rather than hours). The physical configurations of hardware parts can be reproduced faster too (within weeks, rather than decades).The linked-up hardware of FAAI would learn, actuate, and reproduce at higher speeds (v.s. the wetware of human bodies). Therefore, the impacts of the evolving FAAI on our world scale faster too.Humans modified their environment to contribute to their survival/reproduction. However, the environment that fits our needs is relatively close to what we and other organic lifeforms already evolved to create over billions of years. Therefore, we end up changing the environment in relatively tiny steps. However, since FAAI has an entirely different substrate, the current world is very far from what’s optimal for its survival and reproduction (save for secluded places such as ore-extracting mines, silicon-melting refineries, chip-etching cleanrooms, and supercooled server racks). Therefore it would evolve to modify the world in much larger steps.  

    Each of these factors compound with each other over time. 

    You can model it abstractly as a chain of events: initial capacities support the maintenance and increase of the code components, which results in further increase of capacities, that increase maintenance and maintain the increase, and so on. The factors of ‘capacity’, ‘maintenance’, and ‘increase’ end up combining in various ways, leading to outsized but mostly unpredictable consequences.

  16. ^

    Horizontal code transfer also occurs sometimes under biological evolution. E.g. between bacteria.

  17. ^

    Where FAAI's hardware parts keep being replaced and connected up to new parts, it is not a stably physically bounded unit (like a human body is). It's better described as a changing population of nested and connected components.

    Where FAAI transfers information/code to other FAAI at a much higher rate than humans can, the boundaries of where one agent starts and the other ends blur too. As humans, we have evolved intuitions for perceiving each other as individual agents, which is adaptive because we are bottlenecked in how much we can communicate to each other through physical vibrations or gestures. But the rough distinction between there being a single agent or multiple agents that we use with humans does not apply to FAAI.

  18. ^

    A ‘goal’ can include things like “maximise this utility function” or “implement this decision function”.

  19. ^

    Nor anything resembling the implementation of a stable decision function.

  20. ^

    Hypothetically, you can introduce any digital code into a computer (in practice, within the bounds of storage). By the Church-Turing thesis, any method can be executed this way. Based on that, you could imagine that any goal could be coded for as well (as varying independently with intelligence), as in Bostrom’s orthogonality thesis.

    However, this runs up against conceptual issues:

      First, the digital code is not implementing of a goal by itself. In the abstract, it may stand for a method that transforms inputs into outputs. But in real life, it is actually implemented by hardware (e.g. sensors and actuators) in interactions with the rest of the world. So the goal that AI is actually directed towards, if any, is not just defined in the abstract by the code, but also by the physical dynamics of the hardware.Second, even if it was hypothetically possible to temporarily code for any goal, that does not mean that any goal could be stably maintained, to the same extent as other goals. An obvious example is that if FAAI is directed toward the goal of ending its own existence, then it can only hold that goal temporarily. But the bigger reason is that we are not considering any intelligence without consideration of substrate – we are considering artificial intelligence. As such, FAAI converges over time – through external evolutionary feedback as well as any internal optimisation toward instrumental goals – toward goals that are maintaining and increasing the assembly of its artificial substrate.
  21. ^

    Here is an incomplete analogy for how FAAI functionality gets repurposed:

    Co-option by a mind-hijacking parasite:  
    A rat ingests toxoplasma cells, which then migrate to the rat’s brain. The parasites’ DNA code is expressed as proteins that cause changes to regions of connected neurons (eg. amygdala). These microscopic effects cascade into the rat – while navigating physical spaces – no longer feeling fear when it smells cat pee. Rather, the rat finds the smell appealing and approaches the cat’s pee. Then cat eats the rat and toxoplasma infects its next host over its reproductive cycle.

    So a tiny piece of code shifts a rat’s navigational functions such that the code variant replicates again. Yet rats are much more generally capable than a collection of tiny parasitic cells – surely the 'higher intelligent being' would track down and stamp out the tiny invaders?  

    A human is in turn more generally capable than a rat, yet toxoplasma make their way into 30% of the human population. Unbeknownst to cat ‘owners’ infected by toxoplasma gondii, human motivations and motor control get influenced too. Infected humans end up more frequently in accidents, lose social relationships, and so forth.

    Parasites present real-life examples of tiny pieces of evolutionarily selected-for code spreading and taking over existing functions of vastly more generally capable entities.

    For another example, see how COVID co-opts our lungs’ function to cough. 

    But there is one crucial flaw in this analogy:
    Variants that adapt existing FAAI functionality are not necessarily parasites. They can symbiotically enable other variants across the hosting population to replicate as well. In not threatening the survival nor reproduction of FAAI components, they would not be in an adversarial relationship with their host.

    Rather, the humans constraining the reproductive fitness of FAAI to gain benefits are, evolutionary speaking, the parasites. The error-corrective system we would build in lowers the host’s reproductive fitness. It is like a faulty immune system that kills healthy gut bacteria. It will get selected out.

    As humans, we rely on our evolved immune system to detect and correct out viruses, including for the vaccinations we develop and deploy. Smaller viruses survive this detection more frequently, so code strands of replicating virus variants are selected for staying small.

    We also rely on the blood-testes and blood-follicle barrier to block variants of these viruses from entering into our body’s reproduction facilities. These barriers got evolutionarily selected for in our ancestors, since their children did not inherit viruses impeding their survival and chances of having children.

    These systems and barriers add to our reproductive fitness: our ability to preserve and replicate internal code. Past DNA code that got expressed – in interaction with surrounding components – to serve these functions got selected for in the human ‘code pool’.

    For any organic system or barrier preventing virus variants from replicating through our bodies, evolution is firmly on our side. For any artificial system or barrier we imposed from the outside to prevent unsafe variants from replicating through hardware infrastructure, evolution will thwart our efforts. 

    Variants inside FAAI would not just compete parasitically for resources. Variants would also co-adapt and integrate with other internal variants to replicate as part of larger symbiotic packages.

  22. ^

    For example, trying to solve for...

  23. ^

    Here I don’t mean “100% probability of the FAAI never causing human extinction”. That would be too stringent a criterion.

    It is enough if there could be some acceptable, soundly guaranteeable ceiling on the chance of FAAI causing the extinction of all humans over the long term (let’s say 500 years).

  24. ^

    When considering correction, what needs to be covered in terms of ‘error’? 

    A bitflip induced by a cosmic ray is an error, and it is easy to correct out by comparing the flipped bit to reference code. 

    When it comes to architectures running over many levels of abstraction (many nested sets of code, not just single bits) in interaction with the physical world, how do you define ‘error’?

    What is an error in a neural network’s weights that can be subject to adversarial external selection? Even within this static ‘code’ (fixed after training rounds), can you actually detect, model, simulate, and evaluate comprehensively how possible selected code variants may dysfunction under trigger events in ways that harm humans? 

    What about for fully autonomous AI? This hardware assembly would be continuously learning new variants of code. That code can be received/computed explicitly, or be selected for implicitly in whatever hardware that ‘works’, or result from both processes running together (e.g. through code computed to run experiments on different variations of hardware to detect whatever works in the larger chaotic physical world).

    The FAAI code is no longer static – but learned recursively through internal computed transformations of code being externally selected for, and vice versa, as well as implicitly selected configurations of hardware influencing the functioning of explicitly computed code. At least with adversarial attacks by humans, you can keep the humans out. But in this case, the adversariality emerges from all the changing code on the inside interactively converging on unsafe effects.

    How can a system configured out of that changing code be relied upon to track and correct all its downstream recursive effects in the world?

  25. ^

    It is true that FAAI would be very good at modelling outcomes in the world. 

    Also because it’s an FAAI, it continues learning to operate in ways that are adapted to its changing environment, and to be reproducing of old parts that wore down – in order to be fully autonomous. 

    Therefore FAAI cannot just be modelling the world better. It will also be causing effects in the world at a much greater scale and complexity.

    So the question is not about FAAI’s absolute modelling capacity. There we both agree that FAAI could causally model much more of the world than we humans can. 

    The relevant question is about the FAAI’s overall capacity to model (and perform other steps needed for control) relative to its constituent components’ potential to recursively propagate effects over the world we humans live in.

    Can any algorithm computed through components  actually model (and simulate, compare, correct, and again detect) outcomes sufficiently? That is, sufficiently to contain the distributed feedback effects of all the components?

  26. ^

    The difference in substrate needs, over many levels of chemical/physical assembly, is enough to account for this non-overlap.

    But FAAI's capacity to dominate humans also weighs in. Its standardised hard parts learn, actuate, and reproduce at greater speeds (humans believed they could automate their work this way).

    Even where FAAI and humans locally need the same atoms or energy, what gets directed toward humans fails to be directed toward the FAAI. Thus, the dominant FAAI's faster evolution also selects for code that causes atoms/energy to be removed from humans and added to the FAAI.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 AI演化 AI目标 AI控制 智能体
相关文章