少点错误 10小时前
How to Update If Pre-Training is Dead
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了当前AI领域对预训练模型进展放缓的看法。作者指出,尽管早期AI发展很大程度上依赖于预训练的规模化定律,但近期如GPT-4.5和Grok 3等模型的表现似乎预示着预训练的边际效益正在递减。作者认为,那些曾因预训练规模化而相信AGI(通用人工智能)临近的人,在面对当前证据时,应重新评估其时间表和威胁模型。虽然推理和算法的进步仍具潜力,但预训练不再是唯一的驱动力,人们需要更审慎地看待AI发展的时间线。

⏳ **预训练规模化定律的局限性显现**:文章指出,AI领域的许多发展曾得益于预训练的规模化定律,但近期模型(如GPT-4.5和Grok 3)在消耗更多计算资源的情况下,表现并未达到预期,这表明预训练的边际效益正在递减,其在AI发展中的主导作用可能正在减弱。

⚖️ **证据与信念的更新**:作者借鉴牛顿第三定律,强调证据更新的对称性。如果早期证据支持AI快速发展和AGI临近的观点,那么新的证据(如预训练瓶颈)就应该促使人们相应地更新其信念,特别是对于那些主要基于预训练规模化来预测AGI时间线的人。

💡 **重新评估AI发展时间表**:文章强调,尽管预训练可能不再是唯一的加速器,但推理规模化和算法进步仍然是重要的驱动力。然而,对于预训练停滞的信号,人们需要更严肃地重新思考AI发展的时间表,不能仅仅基于过去的预测而固守不变。

💬 **社区讨论与观点交流**:作者邀请读者就AI预训练的现状及其对未来发展预测的影响进行讨论,以收集更多观点和反馈,共同探讨AI发展的下一阶段。

Published on July 28, 2025 2:47 PM GMT

Note: This piece will not spend much time arguing that pre-training is dead—others have done that elsewhere. Instead, the point here is to explore how people ought to update if they believe pre-training is dead. I’m also setting aside questions of degrees-of-deadness and how confident we should be.

Newton’s third law of motion says that for every action, there is an equal and opposite reaction. Something similar applies to Bayesianism: if some piece of evidence E updates you by x in one direction, then ~E should (symmetrically) update you x back in the other direction. This symmetry matters—especially when thinking about the apparent plateauing of progress from AI pre-training.

A lot of AI excitement over the past few years has been driven by scaling laws—and for good reason. Pre-training progress kept beating expectations. Every time people predicted slowdowns, they were wrong. Reasonably, this led to strong updates toward short AI timelines and fast capability growth.

Later, other forms of scaling (e.g., inference, algorithmic progress, etc) added some more weight to these forecasts. It looked like the scaling train had no brakes.

But now, the story’s shifting. Some experts in the field believe the pre-training scaling regime that powered GPT-3, GPT-4, and others is reaching diminishing returns and that we’re now leaning mostly on post-training. Some of the signals for this are:

To be clear, I’m not offering a rigorous, quantitative update here. I’m describing a vibe. My sense is that people who now believe pre-training is mostly exhausted haven’t updated their timelines or threat models nearly as much as they should have.

If pre-training was the main reason you believed AGI was close—and now you believe pre-training has stalled—then you should update pretty strongly away from short timelines. That doesn’t mean updating all the way back to your pre-scaling beliefs: inference scaling and algorithmic improvements seem to be more powerful than we initially thought. But I think people need to rethink timelines more seriously than I’m currently seeing, especially in light of the very evidence that once brought them to their high-confidence positions.

Would be curious to hear if people agree or disagree and why in the comments. 



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI预训练 规模化定律 AGI时间表 GPT-4.5 Grok 3
相关文章