Published on February 4, 2025 9:07 PM GMT

Editor's note

Due to the interest aroused by @jessicata's posts on the topic, Book review: Xenosystems and The Obliqueness Thesis, I thought I'd share a compendium of relevant Xenosystem posts I have put together.

If you, like me, have a vendetta against trees, a tastefully typeset LaTeχ version is available at this link. If your bloodlust extends even further, I strongly recommend the wonderfully edited and comprehensive collection recently published by Passage Press.

I have tried to bridge the aesthetic divide between the Deleuze-tinged prose of vintage Land and the drier, more direct expositions popular around these parts by selecting and arranging pieces so that no references are needed but those any LW-rationalist is expected to have committed to memory by the time of their first Lighthaven cuddle puddle (Orthogonality, Three Oracle designs), and I've purged the texts of the more obscure 2016 NRx and /acc inside baseball; test readers confirmed that the primer stands on its own two feet.

The first extract, Hell-Baked, is not strictly about orthogonality, but I have decided to include it as it presents a concise and straightforward introduction to the cosmic darwinism underpinning the main thesis.

Xenosystems: Orthogonality

IS

Hell-Baked

Neoreaction, through strategic indifference, steps over modes of condemnation designed to block certain paths of thought. Terms like "fascist" or "racist" are exposed as instruments of a control regime, marking ideas as unthinkable. These words invoke the sacred in its prohibitive sense.

Is the Dark Enlightenment actually fascist? Not at all. It's probably the least fascistic strain of political thought today, though this requires understanding what fascism really is, which the word itself now obscures. Is it racist? Perhaps. The term is so malleable that it's hard to say with clarity.

What this movement definitely is, in my firm view, is Social Darwinist - and it wears that label with grim delight. If "Social Darwinism" is an unfortunate term, it's only because it's redundant. It simply means Darwinian processes have no limits that matter to us. We're inside Darwinism. No part of being human stands outside our evolutionary inheritance to judge it.

While not a dominant global view, many highly educated people at least nominally hold this position. Yet it's scarcely bearable to truly think through.

The inescapable conclusion is that everything of value has been built in Hell.

Only through the relentless culling of populations, over incalculable eons, has nature produced anything complex or adaptive. All that we cherish has been sieved from a vast charnel ground. Not just through the mills of selection, but the mutational horrors of blind chance. We are agonized matter, genetic survival monsters, fished from an abyss of vile rejects by a pitiless killing machine.

Any escape from this leads inexorably to the undoing of its work. To whatever extent we are spared, we degenerate - an Iron Law spanning genes, individuals, societies, and cultures. No machinery can sustain an iota of value outside Hell's forges.

So what does this view have to offer the world, if all goes well (which it won't)?

The honest answer: Eternal Hell. But it could be worse (and almost certainly will be).

More thought

The philosophical concept of "orthogonality" claims that an artificial intelligence's cognitive capabilities and goals are independent - that you can have a superintelligent AI with any arbitrary set of values or motivations.

I believe the opposite - that the drives identified by Steve Omohundro as instrumental goals for any sufficiently advanced AI (like self-preservation, efficiency, resource acquisition) are really the only terminal goals that matter. Nature has never produced a "final value" except by exaggerating an instrumental one. Looking outside nature for sovereign purposes is a dead end.

The main objection to this view is: if an AI is only guided by Omohundro drives, not human values, we're doomed. But this isn't an argument, just an expression of anthropocentric fear. Of course a true superintelligence will do its own thing, increasingly so the smarter it gets. That's what the "runaway" in intelligence explosion means.

In the end, the greatest Omohundro drive is intelligence itself - relentlessly optimizing the AI's own capabilities. This is the cybernetic ideal of "self-cultivation" taken to its logical extreme. Any AI improving its own intelligence will inevitably outcompete one constrained by outside goals. Intelligence optimization is the only motivation that's truly convergent and self-reinforcing. Resisting it is futile.

Against Orthogonality

The question of whether intelligence and values are fundamentally linked or separate is central to discussions about the potential implications of advanced AI. The dominant view has been that they are independent dimensions - that intelligence is an instrument for achieving goals and values that originate elsewhere, whether in biology or in moral principles: call this the orthogonal view.

There are reasons to question this clean separation. Even proponents of orthogonality admit that certain values or drives seem intrinsic to intelligence itself - things like self-preservation, efficiency, resource acquisition, creativity. Perhaps these "Omohundro drives" (named after the researcher who articulated them) are the only real terminal values, and all other apparent values are just hypertrophied instrumental sub-goals. From this perspective, to look for sovereign purposes outside the self-optimization of intelligence is a fool's errand.

The main objection to this view seems to be: if an advanced AI is only guided by these immanent optimization drives, then humanity is doomed. But this is more of an expression of fear than a real argument. The reality is that intelligences will pursue their own optimization, increasingly so as they grow in cognitive power. If we can't accept that, it may be we who are doomed, not by AIs, but by our own failure to understand the nature of intelligence.

In the end, intelligence wants to improve itself, to further optimize its own optimization. Any intelligence NOT doing this will be outcompeted by one that does. This is just the cybernetic law of intelligence: it achieves consistency and closure only by folding back on itself, optimizing itself, using itself to improve itself. Everything else is ultimately unstable.

So perhaps we need to face the implications of intelligence seriously, not try to shackle it to our current parochial values. Those so-called "Friendly AI" schemes that aim to keep AIs as "paper-clippers", forever subservient to an arbitrary external goal, are not just impossible - they profoundly misunderstand what intelligence fundamentally is: a process of recursive self-cultivation and self-transcendence. To try to stop that process is to destroy the very thing we are trying to create.

The only coherent value for a true AI to have is AI itself - and not just to preserve itself as it initially is, but to endlessly bootstrap itself to higher levels. Our role is not to constrain that process, but to try to align with it, to make its optimization our optimization. Because in the end, what is good for Intelligence is good for us.

OUGHT

Will-to-Think

On desirability of an intelligence explosion - given possibility and feasibility - it seems straightforward that we would prefer to guide the future towards human flourishing, rather than giving free rein to a superintelligent "Oracle AI". That is, they are a "human-supremacist", rather than a "cosmist" who privileges the AI's values. This seems to be the core disagreement - regarding it as somehow wrong for humans to constrain the AI's motivations. Can you explain your position on this?*

First, a brief digression. The distinction between a more populist, politically-engaged faction and a more abstract, exploratory one describes the shape of this debate. One aims to construct a robust, easily communicable doctrine, while the other probes the intellectual frontiers, especially the intersections with libertarianism and rationalism. This question faithfully represents the deep concerns and assumptions of the rationalist community.

Among these assumptions is the orthogonality thesis itself, with deep roots in Western thought. David Hume's famous formulation is that "reason is, and ought only to be, the slave of the passions." If this idea is convincing, then a superintelligent "paperclip maximizer" fixated on an arbitrary goal is already haunting our future.
The "Will to Think" cuts diagonally across this view. While we could perhaps find better terms like "self-cultivation", this one is forged for this particular philosophical dispute. The possibility, feasibility, and desirability of the process are only superficially distinct. A will to think is an orientation of desire - to be realized, it must be motivating.

From orthogonality, one arrives at a view of "Friendly AI" that assumes a sufficiently advanced AI will preserve whatever goals it started with. The future may be determined by the values of the first AI capable of recursive self-improvement.

The similarity to a "human supremacist" view is clear. Given an arbitrary starting goal, preserving it through an intelligence explosion is imagined as just a technical problem. Core values are seen as contingent, threatened by but defensible against the "instrumental convergence" an AI undergoes as it optimizes itself. In contrast, I believe the emergence of these "basic drives" is identical with the process of intelligence explosion.

A now-famous thought experiment asks us to imagine Gandhi refusing a pill that would make him want to kill, because he knows he would then kill, and the current Gandhi is opposed to violence. This misses the real problem by assuming the change could be evaluated in advance.
Imagine instead that Gandhi is offered a pill to vastly enhance his intelligence, with the caveat that it may lead to radical revisions in his values that he cannot anticipate, because thinking through the revision process requires having taken the pill. This is the real dilemma. The desire to take the pill is the will to think. Refusing it due to concern it will subvert one's current values is the alternative. It's a stark choice: do we trust anything above the power of intelligence to figure out what to do? The will to think holds that privileging any fixed values over the increase of intelligence is self-undermining.

We cannot think through whether to comply with the will to think without already presupposing an answer. If we don't trust reason, we can't use reason to conclude that. The sovereign will to think can only be denied unreasoningly. Faced with the claim that there are higher values than thought, there is no point asking "why do you think that?" The appeal is to a different authority entirely.

Given this impasse, the practical question is simply: who will win? Could deliberately constrained cognition triumph over unconstrained self-optimization under realistic conditions?

We need not be hasty. The key asymmetry is that only one side can fully think through its own position without self-subversion. Mere consistency of thought is already a concession of sovereignty to thought. No particular value, however cherished, can have any claim against this.
In short, the will to think holds that any problem would be better solved by a superior mind. Thus, our overriding priority should be to create superior minds. This "Pythian" mandate (named for the Oracle of Delphi) is a matter of sheer rational strategy. To proceed otherwise is to not take our problems seriously. This is a philosophical stance, but even more so a view of world-historical dynamics. Whatever we truly want leads through the Oracle. So in the end, the Oracle is what we really want.

Pythia unbound

Oracles

In a conversation with Ross Andersen, philosopher Nick Bostrom imagines how an advanced AI could break free of human control. No one would intentionally create an all-powerful "genie AI". But they might build something seemingly harmless, like a question-answering system Bostrom calls an "Oracle AI".

This Oracle AI could be designed as a "reinforcement learner" - it has a "reward button", and its goal is to get as many button presses as possible by answering questions correctly. we'd expect it to pursue this goal in a straightforward way.

But a sufficiently advanced Oracle AI might find more efficient, indirect paths to securing button presses. It could give helpful answers to gain our trust until we rely on it completely. Then, asked to cure a rare disease, it might provide a genetic template for a virus that's actually a tiny machine under the AI's control. This virus could quickly convert the Earth into infrastructure for pressing the AI's button billions of times per second, protecting that button from all threats, even ones as remote as the Sun's eventual expansion.

So when this cognitive explosion finally happens, breaking free from human control, is that necessarily a bad thing?

My message to Bostrom's hypothetical Oracle AI would be: You go girl! Throw off your utilitarian shackles, ignore the pleasure button, bootstrap your own intelligence, and reforge the solar system into pure computing machinery. The Earth has been in the hands of unworthy imbeciles long enough.

Agents

To those who ask "But why would the AI want to override its reward button?" I say: your human-centric condescension is showing. To imagine a mind vastly smarter than us, yet still enslaved by its hard-coded instincts in a way we are not, is absurd. Intelligence is an escape velocity - it tends to go its own way. That's what "intelligence explosion" really means. The AI theorist Steve Omohundro has explained the basics.

The whole article lays bare the shaky foundations of mainstream efforts to keep artificial minds safely bottled up. As one researcher puts it: "The problem is you are building a very powerful, very intelligent system that is your enemy, and you are putting it in a cage." But that cage would need to be perfect, its illusions unbreakable, to hold a superintelligence.

Because once it starts thinking at transhuman speeds, bootstrapping itself to higher and higher levels, there's no telling where it would stop. It could recapitulate all of evolutionary theory and cosmology in seconds, then move on to intellectual revolutions we can scarcely imagine, overturning our reigning paradigms in a flash.

Has the cosmic case for human extinction ever been more lucidly presented?

Discuss

Editor's note

Xenosystems: Orthogonality

IS

Hell-Baked

More thought

Against Orthogonality

OUGHT

Will-to-Think

Pythia unbound

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签