少点错误 2024年07月19日
How do we know that "good research" is good?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了 AI 对齐研究中“好研究”的定义,作者认为,由于目前无法直接评估研究成果对最终目标(即确保 AI 对人类有益)的影响,因此“好研究”的判断主要依赖于研究社区的共识。作者还指出了这种评估方法的局限性,并建议在 AI 对齐研究中更加重视多元化观点和创新性研究。

🎯 **“好研究”的两种定义:** 作者认为,判断研究是否“好”主要有两种方式:一是能够实现研究者关心的目标,二是得到研究社区的认可。在 AI 对齐研究中,由于目前还没有实现 AI 对齐,因此无法直接评估研究成果的实际效果,只能依靠研究社区的共识来判断研究是否“好”。 * **“它很好,因为它有效”:** 这种定义适用于能够直接评估研究成果效果的领域,例如物理学、生物学等。在这些领域,研究者可以通过实验或观察来验证研究成果是否有效。 * **“它很好,因为我们都认为它很好”:** 这种定义适用于无法直接评估研究成果效果的领域,例如 AI 对齐研究。在这些领域,研究者只能依靠研究社区的共识来判断研究是否“好”。 **作者认为,AI 对齐研究目前处于“好研究”的第二种定义阶段,即依靠研究社区的共识来判断研究是否“好”。**

🤔 **研究社区共识的局限性:** 作者指出,研究社区共识的形成是一个复杂的过程,受多种因素影响,例如研究人员的个人偏好、研究领域的流行趋势等。这种评估方法存在一定的局限性,例如: * **容易受到流行趋势的影响:** 由于研究社区共识的形成依赖于研究人员的个人判断,因此容易受到流行趋势的影响。一些研究方向可能只是因为流行而被认为是“好研究”,而实际上可能并不具有实际意义。 * **可能忽略创新性研究:** 由于研究社区共识通常是由少数“领军人物”主导,因此可能忽略一些创新性研究,这些研究可能在短期内无法得到认可,但实际上具有重要的价值。 * **难以评估长期影响:** AI 对齐研究是一个长期的研究方向,目前的研究成果可能无法直接评估其长期影响。研究社区共识可能无法准确反映研究成果的长期价值。

💡 **建议:** 作者建议,在 AI 对齐研究中,应该更加重视多元化观点和创新性研究,并建立更加完善的研究评估体系。作者还强调了以下几点: * **保持警惕:** 研究社区应该保持警惕,避免过度依赖流行趋势和“领军人物”的判断。 * **鼓励创新:** 研究社区应该鼓励创新性研究,即使这些研究在短期内无法得到认可。 * **建立完善的评估体系:** 研究社区应该建立更加完善的评估体系,以更全面地评估研究成果的价值。 **作者认为,只有通过建立更加完善的研究评估体系,才能确保 AI 对齐研究朝着正确的方向发展。**

Published on July 19, 2024 12:31 AM GMT

AI Alignment is my motivating context but this could apply elsewhere too.

The nascent field of AI Alignment research is pretty happening these days. There are multiple orgs and dozens to low hundreds of full-time researchers pursuing approaches to ensure AI goes well for humanity. Many are heartened that there's at least some good research happening, at least in the opinion of some of the good researchers. This is reason for hope, I have heard.

But how do we know whether or not we have produced "good research?"

I think there are two main routes to determining that research is good, and yet only one applies in the research field of aligning superintelligent AIs. 

"It's good because it works"

The first and better way to know that your research is good is because it allows you to accomplish some goal you care about[1] [1]. Examples:

In each case, there's some outcome I care about pretty inherently for itself, and if the research helps me attain that outcome it's good (or conversely if it doesn't, it's bad). The good researchers in my field are those who have produced a bunch of good research towards the aims of the field.

Sometimes it's not clear-cut. Perhaps I figured out some specific cell signaling pathways that will be useful if it turns out that cell signaling disruption in general is useful, and that's TBD on therapies currently being trialed and we might not know how good (i.e. useful) my research was for many more years. This actually takes us into what I think is the second meaning of "good research".

"It's good because we all agree it's good"

If our goal is successfully navigating the creation of superintelligent AI in a way such that humans are happy with the outcome, then it is too early to properly score existing research on how helpful it will be. No one has aligned a superintelligence. No one's research has contributed to the alignment of an actual superintelligence.

At this point, the best we can do is share our predictions about how useful research will turn out to be. "This is good research" = "I think this research will turn out to be helpful". "That person is a good researcher" = "That person produces much research that will turn out to be useful and/or has good models and predictions of which research will turn out to help".

To talk about the good research that's being produced is simply to say that we have a bunch of shared predictions that there exists research that will eventually help. To speak of the "good researchers" is to speak of the people who lots of people agree their work is likely helpful and opinions likely correct.

Someone might object that there's empirical research that we can see yielding results in terms of interpretability/steering or demonstrating deception-like behavior and similar. While you can observe an outcome there, that's not the outcome we really care about of aligning superintelligent AI, and the relevance of this work is still just prediction. It's being successful at kinds of cell signaling modeling before we're confident that's a useful approach.

More like "good" = "our community pagerank Eigen-evaluation of research rates this research highly"

It's a little bit interesting to unpack "agreeing that some research is good". Obviously, not everyone's opinion matters equally. Alignment research has new recruits and it has its leading figures. When leading figures evaluate research and researchers positively, others will tend to trust them. 

Yet the leading figures are only leading figures because other people agreed their work was good, including before they were leading figures with extra vote strength. But now that they're leading figures, their votes count extra.

This isn't that much of a problem though. I think the way this operates in practice is like an "Eigen" system such as Google's PageRank and the proposed ideas of Eigenmorality and Eigenkarma[3].

Imagine everyone starts out with equal voting strength in the communal research evaluation. At t1, people evaluate research and the researchers gain or lose respect,. This in turn raises or lowers their vote strength in the communal assessment. With further timesteps, research-respect accrues to certain individuals who are deemed good or leading figures, and whose evaluations of other research and researchers are deemed especially trustworthy. 

Name recognition in a rapidly growing field where this isn't time for everyone to read everything likely functions to entrench leading figures and canonize their views.

In the absence of the ability to objectively evaluate research against the outcome we care about, I think this is a fine way, maybe the best way, for things to operate. But it admits a lot more room for error.

Four reasons why tracking this distinction is important

Remembering that we don't have good feedback here

Operating without feedback loops is pretty terrifying. I intend to elaborate on this in future posts, but my general feeling is humans are generally poor at make predictions several steps out from what we can empirically test. Modern science is largely the realization that to understand the world, we have to test empirically and carefully[4]. I think it's important to not forget that's what we're doing in AI alignment research, and recognizing that good alignment research means predicted useful rather concretely evaluated as useful is part of that.

Staying alert to degradations of the communal Eigen-evaluation of research

While in the absence of direct feedback this system makes sense, I think it works better when everyone's contributing their own judgments and starts to degrade when it becomes overwhelmingly about popularity and who defers to who. We want the field more like a prediction market and less like a fashion subculture.

Recognizing and compensating for the fact that a domain where feedback is coming exclusively from other people has a stronger incentives to whatever is currently popular

There's less incentive to try very different ideas, since even if those ideas would work eventually, you won't be able to prove it. Consider how a no-name could come along and prove their ideas of heavier-than-air flight are correct by just building a contraption that clearly flies, vs. convincing people your novel conceptual alignment ideas are any good is a much longer uphill battle.

Maintaining methods for top new work to gain recognition

Those early on the seen had the advantage of there was less stuff to read back then, so easier to get name recognition for your contributions. Over time, there's more competition and I can see work of equal or greater caliber having a much harder time getting broadly noticed. Ideally, we've got curation processes in place that mean someone could become an equally-respected leading figure as those of yore, even now, for about equal goodness (as judged by the eigen-collective, of course).

Some final points of clarification

 

  1. ^

    Arguably most scientific work is simply about being able to model things and make accurate predictions, regardless of whether those predictions are useful for anything else. In contrast to that, alignment research is more of an engineering discipline, and the research isn't just about predicting some event, but being able to successfully build some system. Accordingly, I'm choosing examples here that also sit at the juncture between science and engineering.

  2. ^

    Yes, I've had a very diverse and extensive research career.

  3. ^

    I also model social status as operating similarly.

  4. ^

    Raemon's recent recent post provides a cute illustration of this.

  5. ^

    A concrete decision that I would make differently: in a world where we are very optimistic about alignment research, we might put more effort into getting those research results put to use in frontier labs. In contrast, in pessimistic worlds where we don't think we have good solutions, overwhelmingly effort should go into pauses and moratoriums.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI 对齐 研究评估 社区共识
相关文章