Developing AI Safety: Bridging the Power-Ethics Gap (Introducing New Concepts)

Published on April 20, 2025 4:40 AM GMT

TLDR

power-ethics gap

value selection,

This post can be seen as a continuation of the this post.

(To further explore this topic, you can watch a 34-minute video outlining a concept map of the AI space and potential additions. Recommended viewing speed: 1.25x).

This post drew some insights from the Sentientism podcast and the Buddhism for AI course.

My Point of View

I am looking at the AI safety space mainly through the three fundamental questions: What is? What is good? How do we get there?

"What is?"

power

"What is good?"

ethics

"How to get there?"

wisdom

power is the speed of a car, while ethics is the driver's skill

Human History Trends

Historically, human power – driven by increasing data and intelligence – is scaling rapidly and exponentially. Our ability to understand and predict "what is" continues to grow. However, our ethical development ("understanding what is good") is not keeping pace. The power-ethics gap is the car driving increasingly faster, while the driver’s skill is improving just a little bit as the ride goes on and on. This arguably represents one of the most critical problems globally. This imbalance has contributed significantly to increasing suffering and killing throughout history, potentially more so in recent times than even before. The widening power-ethics gap appears correlated with large-scale, human-caused harm.

The Focus of the AI Safety Space

Eliezer Yudkowsky, who describes himself as 'the original AI alignment person,' is one of the most prominent figures in the AI safety space. His philosophical work, many concepts he created, and his discussion forum platform and organizations have significantly shaped the AI safety field. I am in awe of his tremendous work and contribution to humanity, but he has a significant blind spot regarding his understanding of “what is”. Yudkowsky operates within a framework where (almost) only humans are considered sentient, that is his claim, whereas scientific evidence suggests that probably all vertebrates, and possibly many invertebrates, are sentient. This discrepancy is crucial: one of the key founders of the AI safety space has built his perspective on an unscientific assumption that limits his view to a tiny fraction of the world's sentience.

The potential implications of this are profound and this highlights the necessity of re-evaluating AI safety from a broader ethical perspective encompassing all sentient beings, both present and future. This requires introducing new concepts and potentially redefining existing ones. This work is critical since the pursuit of artificial intelligence is primarily focused on increasing power (capabilities), hence it risks further widening the existing power-ethics gap within humanity.

Since advanced AI poses the threat talking away control and mastery of from humans, two crucial pillars for AI safety emerge: maintaining meaningful human control (power) and ensuring ethical alignment (ethics). Currently, the field heavily prioritizes the former, while the latter remains underdeveloped. From an ethical perspective, particularly one concerned with the well-being of sentientkind ('Sentientkind' being analogous to 'humankind' but inclusive of all feeling beings), AI safety and alignment could play a greater role. Given that AI systems may eventually surpass human capabilities, their embedded values will have immense influence.

We must strive to prevent an AI-driven power-ethics gap far exceeding the one already present in humans.

Suggesting New Concepts, Redefining or Highlighting Existing Ones

*A monkey (representing evolution), a human, an AI, and a superintelligence. In order to achieve a good world, we probably need the last three to be aligned.*

Value Selection:

value selection

which

The steering problem

Paul Christiano

Alignment:

Human-Centric Alignment:

Sentientkind Alignment:

Misaligned AI:

Human-Centric AI

Omnikind AI / Sentientkind AI:

Buddha AI:

MAPLE

Human Alignment:

There seems to be insufficient focus on this 'human alignment' prerequisite. While one can argue this falls under broader societal efforts and not AI safety/alignment ones, the urgency created by AI's power scaling suggests it might warrant dedicated attention, perhaps even a distinct movement focused on human alignment to close the power-ethics gap and create wisdom.

Value Alignment Strategies:

Moral Alignment: A proposed broader concept, somewhat overlapping with AI safety, but explicitly emphasizing the moral imperative to scale human and artificial ethics in response to escalating power. It encompasses the goals of maintaining human control (technical alignment), fostering human ethical development ('human alignment'), and ensuring AI systems are ethically sound ('AI ethical alignment').

Different people use different terms in AI safety in different ways, so I would love to hear any thoughts on what I got wrong in my understanding of different concepts.

Discuss

TLDR

My Point of View

Human History Trends

The Focus of the AI Safety Space

Suggesting New Concepts, Redefining or Highlighting Existing Ones

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签