COT Scaling implies slower takeoff speeds

少点错误 2024年09月29日

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

COT/o1 模型的出现，改变了人们对 AGI（通用人工智能）发展的预期。此前，人们认为 AGI 会通过递归自改进迅速发展，但 GPT-3 的出现表明，首个 AGI 实际上需要在大型实验室中使用数十亿美元的超级计算机来构建。COT/o1 模型进一步揭示了 AGI 的训练和运行成本都将非常高昂，这意味着首个 AGI 不会像之前预期那样能够模拟数百万人类。此外，COT/o1 模型也证明了“更快更安全”的理念，因为使用 COT 模型意味着更早实现 AGI 的里程碑，但也意味着在达到更危险的“每个人手机上都有 AGI”的里程碑之前，我们有更多时间来测试、评估和改进 AGI。

🤔 COT/o1 模型改变了人们对 AGI 发展的预期。此前，人们认为 AGI 会通过递归自改进迅速发展，但 GPT-3 的出现表明，首个 AGI 实际上需要在大型实验室中使用数十亿美元的超级计算机来构建。

💰 COT/o1 模型揭示了 AGI 的训练和运行成本都将非常高昂。这意味着首个 AGI 不会像之前预期那样能够模拟数百万人类，因为训练和运行 AGI 都需要大量的资源投入。

🚀 COT/o1 模型也证明了“更快更安全”的理念，因为使用 COT 模型意味着更早实现 AGI 的里程碑，但也意味着在达到更危险的“每个人手机上都有 AGI”的里程碑之前，我们有更多时间来测试、评估和改进 AGI。

💡 COT/o1 模型表明，我们可以通过扩展训练和推理来预测更大模型的能力，这将减少 AGI 发展过程中的风险，因为我们可以提前了解模型的能力，并进行相应的测试和评估。

🚫 COT/o1 模型也强调了在理解 AI 之前就进行监管的弊端。现有的 AI 监管措施主要集中在限制训练计算量，但随着推理计算量的重要性与训练计算量相当，这些法律在生效之前就已经过时了。

Published on September 28, 2024 4:20 PM GMT

This graph is the biggest update to the AI alignment discourse since GPT-3

For those of you unfamiliar with the lore, prior to GPT-3, the feeling was that AGI would rapidly foom based on recursive-self improvement.

After GPT-3, it became clear that the first AGI would in reality be built in a large lab using a multi-billion dollar supercomputer and any idea that it simply "copy itself to the internet" is nonsense.

Under the GPT-3 regime, however, it was still plausible to assume that the first AGI would be able to simulate millions of human beings. This is because the training cost for models like GPT-3/4 is much higher than the inference cost.

However, COT/o1 reveals this is not true. Because we can scale both training and inference, the first AGI will not only cost billions of dollars to train, it will also cost millions of dollars to run (I sort of doubt people are going to go for exact equality: spending $1b each on training/inference, but we should expect them to be willing to spend some non-trivial fraction of training compute on inference).

This is also yet another example of faster is safer. Using COT (versus not using it) means that we will achieve the milestone of AGI sooner, but it also means that we will have more time to test/evaluate/improve that AGI before we reach the much more dangerous milestone of "everyone has AGI on their phone".

Scaling working equally well with COT also means that "we don't know what the model is capable of until we train it" is no longer true. Want to know what GPT-5 (trained on 100x the compute) will be capable of? Just test GPT-4 and give it 100x the inference compute. This means there is far less danger of a critical first try since newer larger models will provide efficiency improvements moreso than capabilities improvements.

Finally, this is yet another example of why regulating things before you understand them is a bad idea. Most current AI regulations focus on limiting training compute, but with inference compute mattering just as much as training compute, such laws are out of date before even taking effect.

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签