少点错误 15小时前
Will morally motivated actors steer us towards a near-best future?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了未来决策者如何避免重大道德错误并促进良好结果的两种可能路径:道德趋同和道德妥协。作者认为,由于当前道德共识薄弱、未来技术发展可能引发新的道德分歧以及人类反思能力有限,广泛的道德趋同可能性较低。相反,不同价值观群体之间的贸易和妥协可能是实现近最佳未来的更可能途径,但也面临资源线性化、威胁行为和权力集中的重大挑战。

🔍 道德趋同:作者认为,尽管许多人期望未来出现广泛的道德共识,但基于当前道德共识的薄弱性、技术发展可能带来的新分歧以及人类反思能力的局限性,广泛的道德趋同可能性较低。

🔄 道德妥协:文章提出,不同价值观群体之间的贸易和妥协可能是实现近最佳未来的更可能途径。例如,环境主义者与功利主义者可以通过控制地球和拓展宇宙来满足各自的主要目标。

⚠️ 挑战:尽管道德妥协提供了希望,但也面临重大挑战,包括资源线性化导致难以进行正和贸易、威胁行为可能导致未来被优化为许多人不愿看到的结果,以及权力集中和贸易限制的风险。

💡 乐观估计:作者认为,尽管存在挑战,部分准确的道德趋同加上贸易和妥协的可能性为未来提供了乐观的理由,预计未来价值至少高于1%,但距离‘繁荣’的极限仍有差距。

Published on August 8, 2025 6:32 PM GMT

To what extent, in the future, will we get widespread, accurate, and motivational moral convergence - where future decision-makers avoid all major moral mistakes, are motivated to act to promote good outcomes, and are able to do so? And, even if we only get partial convergence, to what extent can we get to a near-best future by different groups, with very different moral views, coming together to compromise and trade?

In a new essay, Convergence and Compromise, Fin Moorhouse and I (Will) discuss these ideas.

Given “no easy eutopia”, we’re very unlikely to get to a near-best future by default. Instead, we’ll need people to try really hard to bring it about: very proactively trying to figure out what’s morally best - even if initially counterintuitive or strange - and trying to make that happen.

So: how likely is that to happen?

Convergence

First: maybe, in the future, most people will converge on the best understanding of ethics, and be motivated to act to promote good outcomes, whatever they may be. If so, then even if eutopia is a narrow target, we’ll hit it anyway. In conversation, I’ve been surprised by how many people (even people with quite anti-realist views on meta-ethics) have something like this view, and therefore think that, even if the whole world were ruled by a single person, there would be quite a good chance that we’d still end up with a near-best future.

At any rate, Fin and I think such widespread convergence is pretty unlikely.

First, current moral agreement and seeming moral progress to date is weak evidence for the sort of future moral convergence that we’d need. Present consensus is highly constrained by what’s technologically feasible, and by the fact that so many things are instrumentally valuable: health, wealth, autonomy, etc. are useful for almost any terminal goal, so it’s easy to agree that they’re good. This agreement could disappear once technology lets people optimise directly for terminal values (e.g., pure pleasure vs. pure preference-satisfaction).

Compared to how much we might reflect (and need to reflect) in the future, we’ve reflected very little, to date. What might seem like small differences, today, could become enormous after prolonged moral exploration, especially among beings who can radically self-modify. Two people could be close to each other if they started off in the same spot and walked 10 metres in any direction. But if they keep walking for 100 miles, even slight differences in direction would lead them to end up very far apart.

What’s more, a lot of moral progress to date has come from advocacy and coalition-building among disenfranchised groups. But that’s not an available mechanism to avoid many sorts of future moral catastrophe - e.g. around population ethics, or for digital beings who will be trained to endorse their situation, whatever their actual condition.

Will the post-AGI situation help us? Maybe, but it’s far from decisive.

Superintelligent advice can make us reflect much more, but people might just not adequately reflect on their values, might not do so in time (see Preparing for the Intelligence Explosion), or might deliberately avoid reflection that threatens their identity, ideology, or self-interest. And if people had different starting intuitions and endorse different reflective procedures, then they could end up with diverging opinions even with superintelligent help.

Sometimes people argue that material abundance will make people more altruistic: they’ll get all their self-interested needs met, and the only preferences left to satisfy will be altruistic.

But people could have (or develop) nearly linear-in-resources self-interested preferences, e.g. for collecting galaxies like stamps. (Billionaires put barely a greater fraction of their wealth towards charity, despite having many orders of magnitude more wealth than the middle class.) Or they could act on misguided ideological preferences - this argument doesn’t give a reason for thinking that they’ll become enlightened.

A more general argument against expecting widespread, accurate, motivational moral convergence is this: If moral realism is true, the correct ethics will probably be quite alien; people might resist learning it; and even if they do, they might have little motivation to act in accordance with it. If anti-realism is true, the level of convergence that we’d need seems improbable. Ethics has many “free parameters” (e.g. even if you’re a hedonistic utilitarian - what exact sort of experience is best?), and on anti-realism there’s little reason to think that differing reflective procedures would end up in the same place.

Then, finally, we might not get widespread, accurate, motivational moral convergence because of “blockers”. Evolutionary dynamics might guide the future, rather than deliberate choices. False but memetically powerful ideas might become dominant. Or some early decisions could lock in later generations to a more constrained set of futures to choose between.

Compromise

Even if widespread, accurate, motivational convergence is likely, maybe trade and compromise among people with different values will be enough to get us to a near-best future?

Suppose that in the future, most people are self-interested or favour some misguided ideology, but that some fraction of people end up with the most enlightened values, and want to devote most of their resources to promoting the good. Potentially, these different groups could engage in moral trade.

Suppose, for example, there are both environmentalists and total utilitarians. The environmentalists primarily want to preserve natural ecosystems; the utilitarians primarily want to spread joy as far and wide as possible. If so, they might both be able to get most of what they want: the environmentalists control the Earth, but the utilitarians can spread to distant galaxies.

Moreover, many current blockers to trade could fall away post-superintelligence. With AI help, there could be iron-clad contracts, and transaction costs could be tiny relative to the gains. And there might be many dimensions along which trade could occur. Some groups might value local resources, and others distant resources; some groups might be risk-averse, and others risk-neutral; some groups might time-discount, and others not.

I think this is the most likely way in which we get close to eutopia. But it’s far from guaranteed, and there are major risks along the way. I see three big challenges.

First, post-superintelligence, a lot of different groups might start to value resources linearly (because they’ve had the sublinear part of their preferences satisfied already). And it seems a lot less likely that you can get positive-sum trade between groups with linear preferences.

Second, threats. Some groups can self-modify, or form binding commitments, to produce things that other groups regard as bad (e.g. huge quantities of suffering) unless those other groups give the threatener more resources. This is bad in and of itself, but if some of those threats get executed, then the future could end up partially optimised for whatever lots of people don’t want - a terrifying possibility.

If threats aren’t prevented - and it might be quite hard to prevent them - then the future could easily lose out on much of its value, or even end up worse than nothing.

Finally, there are blockers to futures involving trade and compromise. As well as risks of evolutionary futures and memetic hazards, there’s also the risk of intense concentration of power, and the risk that trade could be prevented (e.g. tyranny of the majority). Perhaps the very highest goods just get banned. (Analogy: today, most rich countries have criminalised MDMA and psychedelic drugs; arguably, from a hedonist perspective, this is a ban on the best goods you can buy.)

I think the chance of getting partial accurate moral convergence plus trade and compromise is a good reason for optimism about the expected value of the future. At least when putting s-risks to the side (a big caveat!), I think it’s enough to make the expected value of the future comfortably above 1%. But I think it’s not enough to get close to the ceiling on “Flourishing” — my own best guess of the expected value of the future, conditional on Survival, is something like 5-10%.

To get regular updates on Forethought’s research, you can subscribe to our Substack newsletter here.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

道德趋同 道德妥协 未来展望 价值观 人工智能
相关文章