Published on July 30, 2025 12:50 AM GMT
Lots of phenomena turn out to have logarithmic returns: to get an improvement, you double effort or resources put in, but then to get the same improvement you have to double inputs again and again and so on. Equivalently, input costs are exponential in output quality[1]. You can probably think of some examples.
I want to know: is 'extra reasoning compute' like this? (Or, under what conditions and by what means can you beat this?) I'm especially interested in this question as applied to deliberate exploration and experiment design.
Said another way, from a given decision-context, without making extra observations or gathering extra data, what are the optimal marginal returns to 'thinking harder'[2] about what to do next?
Intuitively, if I have a second to come up with a plan, it might be weak, five minutes and it might be somewhat reasonable, a day and it'll be better, a year (full time!) and I've reached very diminishing returns. Presumably a century in my ivory tower would be barely better. I'd usually do better trying to get more data.
Is this even a sensible question, or is 'improvement in reasoning output' far too vague to get traction here?
That's the question; below some first thoughts toward an answer.
Simple model: repeated sampling/best of k
If you have a proposal generator, and you can choose between proposals, a simple approach to getting better generations is:
- sample a large number, , of proposals(try to) evaluate and pick the best one
(This is actually the best strategy you could take if you can only add parallel compute, but there might be strictly better approaches if you can add serial[3].)
Even assuming you can unerringly pick the best one, this strategy turns out to have logarithmically-bounded expected value for many underlying distributions of proposals[4]. In fact, for a normally-distributed proposal generator, you even get the slightly worse square root of logarithmic growth[5].
You can in principle sidestep this if your proposal generator has sufficiently heavy-tailed proposal distribution, and you can reliably ex ante distinguish better from worse at the tails.
Other rougher gestures
Search depth
Often to find approximate solutions to problems, we might employ search over a tree-like structure. This emerges very naturally for planning over time, for example, where branching options (whether choice or chance) at each chosen time interval give rise to a tree of possible plans.
If gains are roughly uniform in depth, this gives rise to logarithmic returns to further search. With excellent heuristics, you might be able to prune large fractions of the tree - this gives you a kinder exponent, but still an exponential space to search.
But why/when are gains roughly uniform in depth?
Modelling chaos
Chaotic systems are characterised by sensitivity to initial conditions: dynamics where measurement or specification errors compound exponentially.
So, to forecast at a given precision and probabilistic resolution, it takes exponentially tighter initial specification precision to forecast marginal incremental time depth. (This is why in practice we only ever successfully forecast chaotic systems like weather at quite coarse precision or short horizon.)
Specification precision doesn't exactly map to having extra compute, but it feels close. And marginal incremental time depth doesn't necessarily correspond linearly to 'goodness of plan'.
Combinatorial search
If there's some space of ingredients or components which can be combined for possible insight, the size of the search space is exponential[6] in the number of components in the plan. So if gains are proportional to the number of components in a good plan, you get logarithmic returns to searching longer.
Something similar applies if the design possibilities benefit from combining already-discovered structures in a hierarchy.
Notably this means that, unless you have an exponentially growing source of inputs to counteract it, there's a practical upper limit to growing the output, because you can only double so many times. And with an exponentially-growing input, you can get a modest, linear improvement to output. ↩︎
i.e. computing for longer or computing more parallel. Parallel can't be better than serial in returns to total compute, so I'm mainly interested in the more generous serial case. For parallel, it's easier to bound because the algorithm space is more constrained ('sample many in parallel, choose best' is the best you can do asymptotically). ↩︎
Intuitively you can 'reason deeper' with extra serial compute, which might look like recursing further down a search tree. You can also take proposals and try to refine or improve rather than just throwing them out and trying again from scratch. ↩︎
Proof. Suppose the generator produces proposals with quality . All we assume is that the distribution of has a moment-generating function (this is not true of all distributions, in particular heavy-tailed distributions may not have a MGF). Denote individual samples as . Note first by Jensen's inequality that:
i.e. the exponential of the expected maximum in question is bounded by the expected maximum of the exponentials. But a max of positive terms is bounded by the sum:
(writing for a representative single sample.) But that's just times the moment-generating function (which we assumed exists). So for all positive ,
So (fixing any , or minimising over ,) we see at most logarithmic growth in . ↩︎
Take the proof of the general case for an arbitrary distribution with a moment-generating function. Substitute the normal moment-generating function
Minimising over (positive) ,
Or more than exponential if the order or configuration matters! ↩︎
Discuss