少点错误 04月08日 03:45
What alignment-relevant abilities might Terence Tao lack?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了在人工智能对齐研究领域,能否通过训练年轻的“超级天才”来取得重大进展。作者认为,仅仅依靠从其他领域聘请的天才可能无法胜任,因为他们可能缺乏解决对齐难题所需的关键能力。文章区分了人类智能的两个维度:INT(工作记忆、运算速度)和WIS(形成深度模型、构建良好本体论)。核心问题在于,对齐研究的成功在多大程度上依赖于可学习的技能,以及先天WIS的重要性。文章呼吁相关研究人员分享他们在对齐研究中的核心能力,并探讨如何培养这些关键技能。

🧠 文章区分了人类智能的两个核心维度:INT(主要与IQ测试相关,包括工作记忆、运算速度等)和WIS(形成深度模型、构建良好本体论等)。

🤔 作者质疑了仅仅依靠从其他领域聘请“超级天才”来解决对齐问题的有效性,认为他们可能缺乏对齐研究所需的关键能力。

💡 文章的核心问题是:对齐研究的成功在多大程度上依赖于可学习的技能,以及先天WIS的重要性?

🧐 作者认为,可能存在可以增强WIS的可训练思维技巧,特别是对于INT较高的人来说。

❓ 文章提出了几个关键问题,例如:好的硬世界对齐研究在多大程度上依赖于可学习的技能与先天的WIS?研究人员在对齐研究中拥有的核心能力是什么?

Published on April 7, 2025 7:44 PM GMT

40.  "Geniuses" with nice legible accomplishments in fields with tight feedback loops where it's easy to determine which results are good or bad right away, and so validate that this person is a genius, are (a) people who might not be able to do equally great work away from tight feedback loops, (b) people who chose a field where their genius would be nicely legible even if that maybe wasn't the place where humanity most needed a genius, and (c) probably don't have the mysterious gears simply because they're rare.  You cannot just pay $5 million apiece to a bunch of legible geniuses from other fields and expect to get great alignment work out of them.  They probably do not know where the real difficulties are, they probably do not understand what needs to be done, they cannot tell the difference between good and bad work, [...]

-Eliezer in AGI ruin

This question is about the capabilities that are needed for alignment research in the worlds where alignment is hard, so we need to solve alignment very robustly, for which the easiest path to success likely involves creating a new AGI paradigm where alignment is more feasible.

My guess is that Eliezer is likely right about that we cannot just pay a young supergenius to work on alignment and expect useful (hard-world) alignment progress to come out, but I'm wondering whether we might be able to train them to become capable in the relevant ways.

I'm not asking because of Terence Tao specifically - I think he's too old. I'm thinking about 2 other young supergeniuses, though I don't want to write their names here, mainly because there's opportunity cost to reaching out prematurely.[1]

Background

Let's divide human intelligence into 2 (clusters of) subdimensions, and call them INT and WIS[2]:

    INT: working memory size, accuracy and speed of performing complex operations on working memory content, pattern recognition ability on working memory content, precise long-term memory. (Mostly the subdimensions that are measured through IQ tests.)WIS: forming very deep models over long timescales where even tiny inconsistencies/confusions get noticed, ability to form good ontologies and find core cruxes in problems, precise intuitive Bayesian updating.

John von Neumann and Terence Tao can be seen as examples that sorta max out INT within the observed human variation, and Einstein (and IMO perhaps Eliezer) can be seen as examples that max out WIS.

The problem isn't that the suggestive power isn't big enough. The problem is that the verifier is broken.

-Eliezer in some podcast (but I forgot which)

Very roughly, I think INT maps to 'having high suggestive power' and WIS maps to 'having a good verifier'.

Also, while I agree that 'being able to judge what is progress from what is not' is the current bottleneck, I think we might also need higher suggestive power. (It would be awesome to have another Einstein, but in the hard worlds I'd guess he would be way too slow to solve it in 20 years.)

I think there likely exist trainable thinking techniques which strongly augment someone's effective WIS[3], especially for people with very high INT, though I don't know how far out of reach such techniques are.[4] We already have some[5] such techniques, though often they are not that explicit, and even if they are, we often still lack good training exercises.

The Questions

The questions are mainly directed to competent agent-foundations-like[6] researchers.

Let's assume an unrealistic best-case scenario: Say we have a 20-year old, motivated, and trustworthy[7] Terence Tao, who carefully studies stuff like the sequences, gets mentored by (among others) Eliezer, and tries to work on the most important problems and improve his most important skills.

I basically want to get a better probability estimate for: 

Would this Terence Tao become super-Einstein for alignment research and make a lot more useful progress than has yet been done?

I think a key crux for this is:

How much does good hard-world alignment research depend on learnable skills vs innate WIS?

I think a useful question to ask here is:

What are the core abilities you have that allow you to do useful progress?[8] (Please include whatever comes to mind, whether it's a clearly learnable skill (like "whenever I have formed a hypothesis, I look for a counterexample") or an opaque dimension of your intelligence (like "important ideas/shower-thoughts often just seemingly randomly pop into my mind").)

I'm interested in thoughts on any of those questions. If you have thoughts on multiple questions, perhaps answer them in the reverse order of how I wrote them here.

(You can DM me your thoughts if you prefer to not post an answer publicly[9].)

  1. ^

    Yes I think we're in the peculiar situation where there exist 2 young people who are likely roughly Terence-Tao-level, even though that's very rare. Both were not sane enough to start working on alignment so far, though they are both <=22y old. Also feel free to DM me in case you'd be willing to help with trying to effectively reach out to them.

  2. ^

    Which roughly but probably not exactly correspond to INT and WIS from Projectlawful.

  3. ^

    E.g. if both Einstein and John von Neumann went through dath ilani keeper training, I would guess John von Neumann would come out as far more competent. Even though historically I am more impressed with Einstein as a scientist.

  4. ^

    The techniques for augmenting WIS may work by using INT to a significant extent, so it's perhaps more like separately having a thinking-skill::WIS and a native::WIS and you're effective WIS is more like the maximum of those, rather than techniques adding WIS on whatever your native WIS is. INT might be a lot harder to train. So if we hypothetically had sufficiently good thinking techniques, native high-INT people would end up more competent. (Though INT might be augmentable through gene therapy, though obviously seems very hard.)

  5. ^

    E.g. in Eliezer's sequences (noticing confusion, noticing mysterious answers, holding off on proposing solutions, crisis of faith, the virtues of rationaltiy, defending against biases, ...), and some further ones from CFAR and Reamon, or Fermi estimate skills like Ryan Greenblatt does well (e.g.).

  6. ^

    and also people like Steven Byrnes and Paul Christiano

  7. ^

    trustworthy = sane enough to keep dangerous AI capability insights secret. (And for further specification: Let's NOT assume that this Terence Tao was sane enough to just decide by himself to work on alignment, and rather that we needed to (first pay him and) carefully convince him, but that that was successful.)

  8. ^

    And maybe also: What are the relevant abilities that most people lack?

  9. ^

    E.g. in case you fear saying sth like 'Terence Tao couldn't do that research I did' may be perceived as status hacking.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能对齐 超级天才 INT WIS 可训练性
相关文章