Published on January 10, 2025 4:53 PM GMT

Epistemic status -- sharing rough notes on an important topic because I don't think I'll have a chance to clean them up soon.

Summary

Suppose a human used AI to take over the world. Would this be worse than AI taking over? I think plausibly:

Though human-level AI will have much more agentic training for economic output, and a smaller fraction of HHH training, which could make them less nice.

Humans evolved under conditions where selfishness and cruelty often paid high dividends, so evolution often "rewarded" such behaviour. And similarly, during lifetime learning we often get benefit from immoral behaviour. But we'll craft the training data for AIs to avoid this, and can much more easily monitor there actions and even their thinking. Of course, this may be hard to do for superhuman AI but bootstrapping could work.

AGI is nicer than humans in expectation

way

Conditioning on AI actually seizing power

prior

variance

much worse

So when we condition on AI takeover, we’re primarily conditioning on the ‘corrigible’ part of training to have failed. That probably implies the “give the AI good values” part of training may have also gone less well, but it seems possible that there are challenges to corrigibility that don’t apply to giving AI good values (e.g. the MIRI-esque “corrigibility is unnatural”).So AIs taking over is only a moderate update towards them having worse values, even though its a strong update against corrigibility/cooperativeness!

Conditioning on the human actually seizing power

Human variance is high.

massively

However, humans (e.g. dark triad) are often bad people for instrumental reasons. But if you’re already world leader and have amazing tech + abundance, there’s less instrumental reason to mess others around. This pushes towards the long-run outcome of a human coup being better than you might think by eye-balling how deeply selfish and narcissistic the person doing the coup is.

Humans more likely to be evil.

Other considerations

Humans less competent.

Alien AI values?

There’s some uncertainty here due to ‘big ontological shifts’. Humans 1000 years ago might have said God was good, nothing else (though really they loved friendships and stories and games and food as well). Those morals didn’t survive scientific and intellectual progress. So maybe AIs values will be alien to us due to similar shifts?I think this point is over-egged personally, and that humans need to reckon with shifts either way.

having a clear plan for making progressMaking a good amount of progress minute by minuteMaking good use of resourcesWriting well organised codeKeeping track of whether the project is one track to succeedAvoiding doing anything that isn’t strictly necessary for the task at handA keen desire to solve tricky and important problemsAn aversion to the time shown on the clock implying that the task is not on track to finish.

some

somewhat

selfish/immoral

we could influence the fine-tuning data that models get to make them more reinforcing of HHH drives.

company

Discuss

Summary

AGI is nicer than humans in expectation

Conditioning on AI actually seizing power

Conditioning on the human actually seizing power

Other considerations

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签