少点错误 03月03日 05:06
Request for Comments on AI-related Prediction Market Ideas
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文作者正在设计一系列与AI相关的预测市场,并在Manifold平台上进行。文章提出了三个主要问题,旨在探讨通用人工智能(AGI)发展中的关键议题:首先,首个AGI的创造者是否会优先考虑可纠正性?其次,AGI是否能够促成专家们就如何安全地提升AI能力达成共识?最后,优先考虑可纠正性的AI是否会产生安全的结果?作者希望通过这些市场,收集反馈意见,使问题更清晰、更有价值,并鼓励读者参与讨论,共同探索AGI发展道路上的潜在风险与机遇。

🔑**首个AGI的可纠正性**:作者关注首个通用人工智能(AGI)的创造者是否会将“可纠正性”置于其他目标之上。如果AGI的创建者在安全方法中强调了这一点,市场将判定为“是”,否则,如果在2050年前没有符合标准的AGI出现,市场将判定为“不适用”。

🤝**AGI促成的安全共识**:该市场将在Metaculus的AGI问题解决一年后进行评估。如果领先的AI研究人员和AI本身就确保AI安全发展的明确计划达成一致,市场将判定为“是”。评估将基于AI的评估、人类专家的讨论以及预测市场,以确认AI是否准确总结了人类专家的意见。

🔒**可纠正性与安全结果**:该市场依赖于第一个问题(AGI是否优先考虑可纠正性)的解决。如果第一个市场判定为“是”,则该市场将在一年后以与第二个问题相同的结果进行判定。如果第一个问题判定为“否”或“不适用”,则该市场也将判定为“不适用”。

Published on March 2, 2025 8:52 PM GMT

I'm drafting some AI related prediction markets that I expect to put on Manifold. I'd like feedback on my first set of markets. How can I make these clearer and/or more valuable?

Question 1: Will the company that produces the first AGI prioritize corrigibility?

This question will be evaluated when this Metaculus question: When will the first general AI system be devised, tested, and publicly announced? is resolved.

At that time, I will resolve the market to YES if the organization(s) that were responsible for creating the AGI(s) that triggered the Metaculus result describe their safety approach as giving their AIs goals that put corrigibility above any other goals that the AGI might have.

This market will resolve as N/A if no AGI meeting the Metaculus criteria has been created by 2050.

I will try to evaluate this based on whether the AGI(s) were created following the spirit of Max Harm's Corrigibility As Singular Target Sequence. The AGI(s) need to be corrigible to some person or group of people, but they do not need to be corrigible to end users.

I will not trade in this market.

Question 2:Will AGI create a consensus among experts on how to safely increase AI capabilities?

This market will resolve one year after this Metaculus question: When will the first general AI system be devised, tested, and publicly announced? is resolved.

This market will resolve as N/A if no AGI meeting the Metaculus criteria has been created by 2050.

If the Metaculus question resolves as YES, this market will resolve based on whether leading AI researchers and leading AIs say that they've agreed on a clear plan that will keep any further development of AI safe.

I plan to evaluate the safety, clarity, and extent of agreement on the plan primarily by asking three leading AIs. My planned prompt is:

Please evaluate whether at least 90% of the leading AI developers have agreed on a clear plan for ensuring the safety of any further development of AI capabilities. I plan to use this evaluation to help me resolve a Manifold Market question about when we'll get a safe approach to developing powerful AI. Try to interpret this prompt as it would have been understood in March 2025. Your answer should reflect the beliefs of both humans and AIs who have been responsible for creating the most capable AIs, as well as leaders of the most reputable AI safety organizations.Please use an interpretation of safety that's focused on the risk of AI causing large-scale death. Please consider a plan safe if at least two thirds of well-informed people agree that the benefits of the plan substantially outweigh the risks, and that there's little reason to expect that we can get a lower p(doom) by delaying AI capability work in order to further research safety issues.

For the plan to qualify as "clear," it must have comparable specificity and actionability to:-the Manhattan Project 1 year before Hiroshima;-the Apollo Program 2 years before the moon landing;-Waymo's robocar software circa 2020;-Operation Warp Speed in May 2020.

Plans lacking sufficient detail (similar to the vague safety assurances from AI companies in 2024) will not qualify.

I will choose the AIs based on my impressions of their fairness and access to up-to-date news. If I were resolving this today, I would expect to use Perplexity (with Claude, then GPT-4.5 as underlying models), and DeepSeek R1.

In addition to the evaluations given by AIs, I will look at discussions among human experts in order to confirm that AIs are accurately summarizing human expert opinion.

I will also look at prediction markets, with the expectation that a YES resolution of the market should be confirmed by declining p(doom) forecasts.

[Should I resolve this earlier than one year after AGI if the answer looks like a clear YES?]

I will not trade in this market.

Question 3: Will prioritizing corrigible AI produce safe results?

This market is conditional on the market [question 1] "Will the company that produces the first AGI have prioritized Corrigibility?". This market will resolve as N/A if that market resolves as NO or N/A.

If that market resolves as YES, this market will resolve one year later, to the same result that [question 2] is resolved as.

I will not trade in this market.

[Questions for LessWrong readers:]

What ambiguities should I clarify?

Should I create multiple versions of questions 2 and 3, with different times to resolution after question 1 is resolved?

Should I create additional versions based on other strategies than corrigibility? I may well avoid creating markets that look like they might be hard for me to resolve, while still being happy to create a version of question 3 that depends on another strategy if you create a version of question 1 that uses the strategy that you want.

Should I replace the Metaculus Date of AGI question with something closer to the date of recursive self-improvement? I'm tempted to try something like that, but I expect a gradual transition to AIs doing most of the work. I don't expect to find a clear threshold at which there's an important transition. I'm somewhat pessimistic about getting AI labs to tell me how much of the improvement is attributable to AI recursively developing AI. Metaculus has put a good deal of thought into what questions can be resolved. I'm trying to piggy-back on their work as much as possible.

An ideal criterion for when AGI arrives would involve measuring when the majority of AI notkilleveryone work is done by AIs. This feels like it would be too hard to objectively resolve.

Other possible criteria for when AGI arrives could involve employment at AI companies. If an AI company replaces their CEO with an AI, that would be a great sign that the relevant capabilities have been achieved. Alas, it will likely be obscured for years by legal requirements that a human remain nominally in charge.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AGI 可纠正性 AI安全 预测市场 AI共识
相关文章