Lie Detectors. Technical solutions to the cooperation problem.

Published on May 24, 2025 8:05 PM GMT

The purpose of this post is to argue for prioritizing the 'democratizing' of AGI, even before we've figured out alignment.

It's also an argument against favoring historically useful economic and social policies in the runup to a post AGI world.

The intended effect is to encourage people to start seriously discussing and organizing ways to increase the size of the "Minimum Viable Oligarchy" which may come packaged with an aligned AI, such that it includes all of humanity.

It also includes a potential draft of a solution to the lack of trust which has so far made cooperation difficult.

The main gist is to highlight the importance of developing reliable lie detectors.

A prepatory note: This article is not an anticapitalism rant. In fact, I believe Capitalism is one of the greatest forces of human cooperation yet conceived.

The Ready Answer to any Problem

I came upon a website called Capitalism Magazine.

I didn't notice the name until after I'd read the article. This was fortunate, because it made allowed me to read it with an impartial light.

The website had a hard Randian leaning, and the writers there shared some interesting views, but one point that kept cropping up was this: Capitalism has done more, by lifting billions out of poverty, than every act of kindness ever conceived.

This is a powerful argument. In fact, if one squints, one can see how it parallels my earlier statement about how capitalism was a great force for human cooperation.

However, saying that Capitalism (proper noun) solved extreme poverty only mimics the shape of understanding. Instead, let's look at exactly how a system built on human self interest managed to help so many.

The Randian is happy to answer this. It's because all those people in poverty were capable of doing economically useful work, which they were able to trade -- as was in line with their Rational self interest -- in exchange for better living conditions.

Even in cases where they couldn't bargain, it was in the rational self interest of the fortunate business moguls to create efficient societies, which helps everybody.

Besides which, truly Rational self interest involves treating other's kindly. Contributing even minutely to a violent and hateful society wouldn't benefit a person living in it, after all.

This is true, in a provisional manner.

But then comes AI.

And, suddenly, billions of people are no longer necessary.

This is a problem.

Worse yet, there are lively circles of debate which seem unable or unwilling to see the fact that

Prediction

Even as of 2025, there are people making the Capitalism argument as a responce to this problem.

Frequenters of this site tend to be more aware of the implications of AGI, so I won't belabor the point.

Instead, let's talk about the potential evils of the future.

Alignment is solved

The alignment problem is becoming a topic of popular discourse.

However, the average person remains more worried about job losses and inequality that ai systems might engender.

This might come as a point of frustration to rationalists, who sit by, shaking their heads at how all the normies are missing the big picture. However, the thesis of this essay is to push back against this perception.

Because, the people worried about inequality are right.

Prediction

Because, let's say we solve alignment, and the billion dollar models come online.

What has been the historical behavior of human beings when given power and limited oversight? What is the fate of people unable to provide economically useful work?

Look at North Korea, for instance-

"Wait!" I can imagine people at this point screaming at their screens: "BUT CHINA is supporting them! Such an IRRATIONAL system wouldn't stand without an external authority propping it up against the weight of its own ludicrous nature!"

Yeah... and the AI would be more powerful than China. It would have more relative power than any human created force yet divined. Yes, even more than Capitalism.

At this junction, I can hear more level heads retort: but surely 'someone' would stop any potential dictator before it came to such an eventuality. They can't run an AI without cooperation from the energy companies. They can't build the worker robots without a jump start from the human work force. They can't design the first AGI without intellectual effort from human AI scientists. They can't expect to survive without mercenaries to mop up the protestors, etc...

Even dictators need coalitions of support, after all.

And, sure, let's assume those are all constraining factors.

Again, we can look to history for a rough outline.

The Americas had a slave population of around 20%, and they succeed in quelling uprisings for centuries.

Rome had a slave population of around 30% as well, and they never, in 800 years, had a truly successful slave uprising.

Sparta had a ridiculous 65-80% slave percentage, and they managed by resorting to frequent massacres.

Haiti, on the other hand, was nearly 90% slaves in the 19th century, and they're the notable exception.

So we've got ourselves a range. It's exampled in history that potentially up to 80% of the human population in a society can be disempowered.

Of course, that's relying on historical figures.

Likely, with AGI able to displace skilled workers, it's feasible we could have a Minimum Viable Oligarchy of around a few dozen, to a few handful of people at the extreme end.

I think it's important to avert this possibility.

I think it's important to start drafting bulwarks against it now, even when alignment hasn't been solved.

Prediction

Because I don't believe an aligned AI would be automatically beneficial, given our current governance structures.

I'm sure the claims made so far will engender counterarguments, so here's a list of refutations for those interested.

This is a non-issue

This section is directed at those who don't feel that this is not an important topic, or at least not as important a topic as alignment.

Even if we don't have a proper governance structure, it's still better to have an aligned AI in the hands of humans than an unaligned ai.

Well, let's consider the stakes.

An unaligned AI, probably, rearranges the universe in a way that is, probably, going to be unsuitable for human life. It would do this because, in the vast space of possible minds, only an infinitesimal amount care about humans in any way.

Humans, on the other hand, care intrinsically care about other humans. Or rather, they care about a small group of humans close to them: friends, family, so on. And they care about strangers that can do things for them. Otherwise, they can happily wake up and eat their cheerios while watching horrible things happening to millions of people on the morning newscast.

Notably, humans also have a reliable failure mode colloquially known as power tripping.

Usually, this is used trivial matters, to describe abusive managers or annoying bosses, but the limits for abuse seem to scale with the level of power afforded a person.

I honestly believe that there are store managers out there just as bad, if not worse, than the worst genocidal maniac you can care to name. It's just that they're held in check by Rational self interest of the Randian sort, considering they live at a rung of society where the consequences for antisocial behavior can be quite extreme and painful.

A counter argument to this might try to use my historical argument against me.

Look at the historical record! Surely, even the worst tyrant managed better than literal human extinction. And at least they cared for their own people! A virtue that, while not ideal, would at least ensure the continuity of human flourishing, far better that than extinction from an effective altruist point of view.

I'll answer this with a question. Are there fates worse than death?

Historically, humans have not treated each other well. I'll leave the graphic details out.

Besides which, consider all the worst rulers of history. The worst excesses of violence and depravity imaginable.

Consider that an AI Oligarchy would be to those, what those are to the aforementioned store manager.

Even dictators need bases of power, and the people they rely on have bases of power on their own, and on it goes down until it gets to the most menial worker.

Even helots, murdered casually by Spartans for sport, were still instrumental to the Spartan state for the labor they provided. This naturally acted as a tempering force on how many helots the Spartans could kill, or how severely they could maim their bodies.

An AI oligarchy, with absolutely no need for human labor, only have personal desire guiding their treatment of a disempowered population. Again, I'll leave out the graphic details, but surely, I suspect even the sternest utilitarian would agree that a future where most beings are living in destitution and sheer dependency is worse than one where those people never existed in the first place.

Prediction

Humans wouldn't do that! Considering the abundance an aligned AI would provide, surely even the worst person wouldn't mind ceding some insignificant fraction of the future light cone for the benefit of human kind?

This is an argument from incredulity.

I've seen people like Liron Shapira make it, which surprised me considering the level of clarity he's shown in his debates.

Because, maybe the argument is correct. Maybe the average person would be willing to give up, say, enough resources to sustain a hundred billion humans for a hundred billion years.

Or, maybe, human desire truly is endless, and even an average person, when faced with a decision between sustaining people they'll never be able to meet, and living for a quadrillion years longer, would make Rationally self interested choice?

Maybe they'd only sustain enough people to create an environment that satisfies their own desire for heading a social hierarchy, and gathering worshippers ala Egyptian pharoes.

I suspect, given a long enough time horizon, the latter possibility is a more likely outcome.

Human nature is subject to ego and hierarchy. I sometimes notice it in myself, whenever I introspect and realize that I get irrationally annoyed at people who ostensibly 'act out of their station', but when someone else of the 'appropriate' rank acts in exactly the same way, I feel nothing in particular.

Kind of like how a guy in monks robes could come up and start saying mystical bullshit, and -- if said with enough conviction -- I'll take it as normal.

Someone in party clothes comes over and says the same thing, I feel a need to roll my eyes at the waste of time.

I suspect there's some benefits to social cohesion as a result of this, but regardless it's a detriment when someone is shackled with a supernormal level of hierarchical stimuli.

Prediction

Whatever the case may be, however.

Prediction

In conclusion.

Power is corrosive to the human personality.

An aligned AI has the potential to give a small subset of humanity an incredible amount of relative power.

And

Prediction

Ok, so what's the solution, then?

Put briefly, I don't have an exact solution on hand.

This post is mainly meant to put into words the worries I believe everyone has, but which I haven't seen anybody saying aloud.

I don't know why no one has spoken of the power problem, except in vague ways relating to economics, and worker retraining, and UBI. Perhaps I've merely missed the articles. I haven't been keeping up as stringently with AI news recently. In case this does turn out to be a novel addition to the Lesswrong corpus, however, I'm hoping it'll be the start of serious discussions about organizing political systems and creating technological solutions which can allow humanity to coordinate effectively on the issue.

However, I'm interested in this rough outline.

I suspect centralizing a compute cluster and international treaties will be a necessary facet of safely creating AI.Furthermore, distributing the resources necessary so that a diverse array of peoples each have effective 'veto power' on whether a training run goes through will be necessary, not only for avoiding disempowerment, but also in giving people a stake in AI development. These resources can be anything from encryption keys to an off switch for the powerplants.I believe we need to invest in lie detection technology. Every year by which lie detection technology can be accelerated increases the odds of success tremendously.

I believe lie detection is a promising avenue of technological development because it would allow for progress of the political discourse.

If functional, it would allow doomers to make more effective arguments. Even now I see skepticism online regarding ai dangers, as if all the hype about ai doom is merely a marketing tactic.

However, being able to show one's sincerity would be a powerful argument in the face of such skepticism.

Furthermore, lie detection would allow the keys of power mentioned above to be placed in hands that are verifiably trustworthy.

What makes you think 'lie detection' is possible?

In principle, I believe it should be because humans are aware of their own intentional lies. This suggests that the act of lying should looks consistently different in an MRI.

Secondly, I remember studies from over a year ago where people were able to stitch together images of people's dreams/imaginations via deep learning mechanisms. A lie detector seems relatively straightforward in comparison.

Thirdly, I believe there are a lot of low hanging fruit with regards to the advancements of this technology, mainly because I don't suspect there exists a large corpus of fmri data detecting lies, so anyone with the cash to begin studies should expect to make relatively quick progress if they applied modern AI to the task.

Fourthly, lie detection has a lot of utility in preventing false arrests, rooting our corruption, and generally preventing the great harm that lies can bring, so it ought to be relatively easy to make a business case even for those not inclined to take ai dangers seriously.

Prediction

The bad news.

This can be put briefly.

The closer AGI seems, the more the platitudes are slipping away.

There are certain AI companies out there who professed to believe AGI was near, and who proclaimed to care about the disempowerment issue, even going so far as to create a non profit board who's purpose was to make sure AGI was used for the good of humanity, and to put in their charter guarantees that true AGI would not be used purely for profit.

That same company is now dismantling its own oversight structure in the face of mundane profit.

This leads to one of three possible conclusions.

They don't believe AGI is possible.

They don't understand what AGI would be capable of.

They do believe, and do understand, and they don't want anyone to be able to stop them when the time comes.

Prediction

All other AI companies have, to my knowledge, made no enforceable commitments in the first place.

The good news.

The article has been bleak, so far, I know, but I'd like to point out some favorable points of consideration.

Our opponents aren't evil.

This may come as a surprise, considering how much of this article was dedicated to highlighting the pitfalls inherent in human nature.

However, remember that those failures of human morality are just that. Failures. You or I would probably be subject to the same corrosive forces if given enough power, and the people we're negotiating with now, many of them, are in more reasonable states of mind.

Our opponents are few.

The problem at hand has a natural equilibrium.

The smaller the viable oligarchy, the worse the problem.

Conversely, however, the smaller the oligarchy, the more allies we have.

Because this is not a class issue.

Our allies in the face of AGI disempowerment could include the heads of nations, most billionaires, the vast, vast majority of millionaires, generals, dictators, criminals, saints, sinners, every AI company that feels like they're falling behind, the chip fabs, the power plants, the alphabet agencies, the majority of AI scientists, and basically every single person that doesn't suspect they'll have a seat at the table when the AGI switches on.

On the importance of presentation.

I stated that this movement could have powerful allies.

Given this, I don't suspect that media manipulation, government suppression, or manufactured consent will be massive issues.

Granted, perhaps an Oligarchy with access with advanced AI will be able to create a perfectly sized system of cooperation that manages to exclude much of humanity, but I digress.

The point remains, that whenever it comes to apparently sensible arguments that everyone should be able to agree with, presentation is important.

What does this mean?

Three things, mainly.

We should make economic disempowerment the main focus when presenting the problem. It's something most people are ready to understand, which has fewer novel factors for them to wrap their minds around, and fear of disempowerment would be enough to bring a lot of people to the negotiating table.We should avoid excluding people unnecessarily. This is exactly why I wrote 'this is not a class issue' in bold. Ai disempowerment affects the vast, vast majority of humanity, even the 'elites'. We shouldn't paint them as the enemy or start playing blame games, even implicitly.We should make the stakes clear, and present arguments and solutions in a way which won't rub up against the sensibilities of a certain sect of Randian and libertarian adherents, who make up a surprisingly lively cricle of internet discourse. Falling into the mire of that age old argument can only delay the movement, I think.

What happens now?

I suspect the world is headed towards an inflection point.

The Minimum Viable Oligarchy, as I've said it, is perhaps inevitable in the case of an AGI.

However, not all hope is lost, because the actual size of a Minimum Viable Oligarchy can become quite large. If we create a system with levers of power in the right places, ones with actual ability to control the fate of AI development, we might be able to increase the size of that Oligarchy to encompass the whole of humanity.

I believe we need to have a serious discussions about what actions can be taken to expand the scope of that Minimum Viable Oligarchy, so that a critical mass of people are included in it.

I don't claim to know exactly the trajectory AI will take. Perhaps the gpt architecture will stall, or maybe the we'll be given a warning shot which scares everyone into cooperating. Either way, I think we need to be ready to take advantage of any fortunate developments, and that starts first with the political organization, as well as the technical solution of lie detectors.

Discuss

The Ready Answer to any Problem

Alignment is solved

This is a non-issue

Ok, so what's the solution, then?

What makes you think 'lie detection' is possible?

The bad news.

The good news.

On the importance of presentation.

What happens now?

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签