Yudkowsky on The Trajectory podcast

Published on January 24, 2025 7:52 PM GMT

I found this interview notable as the most useful public statement yet of Yudkowsky's views. I congratulate both him and the host, Dan Fagella, for strategically improving how they're communicating their ideas.

The host, Dan Fagella is to be commended for asking the right questions and taking the right tone to get an extraordinarily concise statement of Yudkowsky's views on what we might do to survive and why. It also seemed likely that Yudkowsky has thought hard about his messaging after having his views both deliberately and accidentally misunderstood and panned. Despite having followed his thinking over the last 20 years, I gained new perspective on his current thinking from this interview.

Takeaways:

Metaphor: launching a space probe vs. testing components

And monitoring the use of GPUs, which would legally all be run in data centersHere Yudkosky is admirably restrained, noting that governance is not within his expertise.

History is just rife with people being surprised their projects and approaches don't work

I find myself very much agreeing with his focus on human cognitive limitations and our poor historical record of getting new projects right on the first try. I researched cognitive biases as the focus of my neuroscience research for some years, and came to the conclusion that wow, humans have both major cognitive limitations (we can't really take in and weigh all the relevant data for complex questions like alignment) and have major biases, notably a sort of inevitable tendency to believe what seems like it will benefit us, rather than what's empirically most likely to be true. I still want to do a full post on this, but in the meantime I've written a mid-size question answer on Motivated reasoning/ confirmation bias as the most important cognitive bias.

My position to date has been that, despite those limitations, aligning a scaffolded language model agent (our most likely first form of AGI) to follow instructions is so easy that a monkey(-based human organization) could do it.

After increased engagement on these ideas, I'm worried that it may be my own cognitive limitations and biases that have lead me to believe that. I now find myself thoroughly uncertain (while still thinking those routes to alignment have substantial advantages over other proposals).

And yet, I still think the societal rush toward creating general intelligence is so large that working on ways to align the type of AGI we're most likely to get is a likelier route to success than attempting to halt that rush.

But the two could possibly work in parallel.

I notice that fully general AI is not only the sort that is most likely to kill us, but also the type that is more obviously likely to put us all out of work, uncomfortably quickly. By fully general, I mean capable of learning to do arbitrary new tasks. Arbitrary tasks would include any particular job, and how to take over the world.

This confluence of problems might be a route to convincing people that we should slow the rush toward AGI.

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签