Published on July 28, 2025 2:47 PM GMT
Note: This piece will not spend much time arguing that pre-training is dead—others have done that elsewhere. Instead, the point here is to explore how people ought to update if they believe pre-training is dead. I’m also setting aside questions of degrees-of-deadness and how confident we should be.
Newton’s third law of motion says that for every action, there is an equal and opposite reaction. Something similar applies to Bayesianism: if some piece of evidence E updates you by x in one direction, then ~E should (symmetrically) update you x back in the other direction. This symmetry matters—especially when thinking about the apparent plateauing of progress from AI pre-training.
A lot of AI excitement over the past few years has been driven by scaling laws—and for good reason. Pre-training progress kept beating expectations. Every time people predicted slowdowns, they were wrong. Reasonably, this led to strong updates toward short AI timelines and fast capability growth.
Later, other forms of scaling (e.g., inference, algorithmic progress, etc) added some more weight to these forecasts. It looked like the scaling train had no brakes.
But now, the story’s shifting. Some experts in the field believe the pre-training scaling regime that powered GPT-3, GPT-4, and others is reaching diminishing returns and that we’re now leaning mostly on post-training. Some of the signals for this are:
- GPT-4.5 used nearly an order of magnitude more compute than GPT-4 but was widely seen as underwhelming—possibly even worse in some cases. OpenAI avoided branding it a "frontier" model and OpenAI said GPT‑4.5 would be removed from the API, just a few months after launch.Grok 3 reportedly used another OOM of compute but also disappointed.
To be clear, I’m not offering a rigorous, quantitative update here. I’m describing a vibe. My sense is that people who now believe pre-training is mostly exhausted haven’t updated their timelines or threat models nearly as much as they should have.
If pre-training was the main reason you believed AGI was close—and now you believe pre-training has stalled—then you should update pretty strongly away from short timelines. That doesn’t mean updating all the way back to your pre-scaling beliefs: inference scaling and algorithmic improvements seem to be more powerful than we initially thought. But I think people need to rethink timelines more seriously than I’m currently seeing, especially in light of the very evidence that once brought them to their high-confidence positions.
Would be curious to hear if people agree or disagree and why in the comments.
Discuss