Published on July 21, 2025 4:50 PM GMT

This post from Gwern tackles a question that I suspect could become very relevant for AI automating AI research (and jobs more generally), which is why don't current AIs produce frontier-expansion/insights semi-reliability beyond their training data, and what might be necessary for AI to create insights at least semi-reliably.

My takeaways are at the end.

An important point is that I frame this as creating insights at least semi-reliably, rather than being able to create any insights at all, because LLMs have already created insights, so it can't be due to a fundamental incapacity, but rather a practical incapability.

To be clear, practical incapability can be nearly as bad/good as fundamental incapability, so this nitpick doesn't matter too much.

Links below:

https://www.lesswrong.com/posts/GADJFwHzNZKg2Ndti/have-llms-generated-novel-insights#H8a4ub3vura8ttuPN (not too impressive, but definitely counts here)

https://www.lesswrong.com/posts/vvgND6aLjuDR6QzDF/my-model-of-what-is-going-on-with-llms?commentId=jLahLy4SRyA4Fuyc2 (an ancedote that an LLM managed to solve a step in synthesizing a chemical, and gave a plausible causal story for why it worked, and notably even when searching the internet/asking around, there still wasn't any discussion, implying that this wasn't in it's training data)

(Fun fact, this post/idea by Gwern itself was born out of an eruption of insight).

Quotes below:

Continual Learning:

Frozen NNs are amnesiacs. One salient difference is that LLMs are ‘frozen’, and are not allowed to change; they don’t have to be, and could be trained on the fly (eg by the longstanding technique of dynamic evaluation), but they aren’t.
So perhaps that’s a reason they struggle to move beyond their initial guesses or obvious answers, and come up with truly novel insights—in a very real sense, LLMs are unable to learn. They are truly amnesiac. And there are no cases anywhere in human history, as far as I am aware, of a human with anterograde amnesia producing major novelties.
That may be an adequate answer all on its own: they are trapped in their prior knowledge, and cannot move far beyond their known knowledge; but by definition, all that is either known or almost known, and cannot be impressively novel.

Continual Thinking:

But another notable difference is that human researchers never stop thinking. We are doing our continual learning on not just observations, but on our own thoughts—even when asleep, a human is still computing and processing. (This helps account for the shocking metabolic demands of even a brain which is ‘doing nothing’—it’s actually still doing a lot! As difficult as it may feel to think hard, from a biological perspective, it’s trivial.)
Research on science & creativity emphasizes the benefit of time & sleep in creating effects like the incubation effect, and some researchers have famously had sudden insights from dreams. And we have all had the experience of a thought erupting into consciousness, whether it’s just an inane pun (“you can buy kohl at Kohl’s, LOL”), a clever retort hours too late, a frustrating word finally coming to mind, suddenly recalling anxious worries (“did I really turn off the stove?”) like intrusive thoughts, or, once in a lifetime, a brilliant idea. (Try meditating for the first time and writing down all the thoughts that pop up until they finally stop coming, and one may be amazed & frustrated!)
Often these eruptions have nothing at all to do with anything we have been thinking about, or have thought about in decades (“wait—back at that college party, when that girl looked at my hand—she was hitting on me, wasn’t she?”) Indeed, this essay is itself the product of such an eruption—“what is the LLM equivalent of a default mode network? Well, it could look something like Jones 2021, couldn’t it?”—and had nothing to do with what I had been writing about (the esthetics of video games)

Hypothesis: Day-Dreaming Loop

So… where & when & how does this thinking happen?
It is clearly not happening in the conscious mind. It is also involuntary: you have no idea some arcane random topic is bubbling up in your mind until it does, and then it is too late.
And it is a universal phenomenon: they can happen spontaneously on seemingly any topic you have learned about. It seems difficult to exhaust—after a lifetime, I still have the same rate, and few people report ever having no such thoughts (except perhaps after highly unusual experiences like psychedelics or meditative enlightenment).
It is also probably expensive, given the cost of the brain and the implication that nontrivial thought goes into each connection. It is hard to tell, but my guess is that almost all animals do not have ‘eureka!’ moments. We can further guess that it is probably parallelizable, because the connections are between such ‘distant’ pairs of concepts that it is hard to imagine that the brain has a very large prior on them being related and is only doing a handful of serial computations in between each ‘hit’; they are probably extremely unlikely to be related, hence, many of them are being done, hence, they are being done in parallel to fit into a human lifetime.
It is presumably only partially related to the experience replay done by the hippocampus during sleep, because that is for long-term memory while we have these thoughts about things in Working memory or short-term memory (eg about things during the day, before any sleep); there may well be connections, but they are not the same thing. And it is likely related to the default mode network, which activates when we are not thinking anything explicitly, because that is strongly associated with daydreaming or ‘woolgathering’ or ‘zoning out’, which is when such thoughts are especially likely to erupt. (The default mode network is especially surprising because there is no reason to expect the human brain to have such a thing, rather than go quiescent, and indeed, it took a long time for neuroscientists to accept its existence. And there is little evidence for a default mode network outside primates and possibly some mammals like rats.)
It further appears to be ‘crowded out’ and probably not happening when doing ‘focused’ learning or thinking: in my personal observation, when I have been intensively doing something (whether reading research, writing, coding, or anything else novel & intellectually demanding), the thoughts stop happening… but if I take a break, they may suddenly surge, as if there was a dam holding them back or my brain is making up for lost time.
So where is it?

Day-Dreaming Loop

I don’t know.
But to illustrate what I think answers here look like, here is an example of an answer, which satisfies our criteria, and is loosely inspired by wake-sleep algorithms & default mode network, and is not obviously wrong.
Let’s call this Day-dreaming loop (DDL): The brain is doing combinatorial search over its store of facts & skills. This is useful for sample efficiency by replaying old memories to extract new knowledge from them, or to do implicit planning (eg to patch up flaws in temporally-extended tasks, like a whole human lifetime). DDL does this in a simple way: it retrieves 2 random facts, ‘thinks about them’, and if the result is ‘interesting’, it is promoted to consciousness and possibly added to the store / trained on. (It is not obvious how important it is to do higher-order combinations of k > 2, because as long as useful combinations keep getting added, the higher-order combinations become implicitly encoded: as long as 1 of the possible 3 pairs gets stored as a new combination, then the other can be retrieved and combined afterwards. Higher-order combinations where all members are uninteresting in any lower-order combos may be too sparse to be worth caring about.) DDL happens in the background when the brain is otherwise unoccupied, for one’s entire lifetime. So an example like the Kohl’s example would have happened like ‘retrieve 2 loosely semantic-net-related concepts; think about just those two; is the result interesting? yes, because there’s a pun about an unexpected connection between the two. Promoted!’
We can elaborate on DDL in various ways, like training on both interesting and uninteresting results, labeled as such, and then try to sample ‘interesting’-prefixed; or try to come up with a more efficient way of doing sampling (sampling-without-replacement? reservoir sampling? importance sampling approaches? anti-spaced repetition?), or fiddling with the verification step (do they need to be passed to oracles for review before being saved, because this search process is dangerously self-adversarial?).
But that’s unnecessary, as DDL already satisfies all the criteria, and so worth discussing:
It is plausible from a RL perspective that such a bootstrap can work, because we are exploiting the generator-verifier gap, where it is easier to discriminate than to generate (eg laughing at a pun is easier than making it). It is entirely unconscious. Since it is lightweight, it can happen in parallel, in independent modalities/tasks (eg verbal replay can happen separate from episodic memory replay). And by the nature of recombination, it is difficult to ‘exhaust’ this process because every ‘hit’ which gets added to the store will add many new combinations—surprisingly, in a statistical toy model of economic innovation, economist Charles I. Jones 2021 shows that even though we pick the low-hanging fruit first, we can still see a constant stream of innovation (or even an explosion of innovation). It is, however, highly expensive, because almost all combinations are useless. And it is difficult to optimize this too much because by the nature of online learning and the passage of time, the brain will change, and even if a pair has been checked before and was uninteresting, that might change at any time, and so it can be useful to recheck.

LLM Analogy

Clearly, a LLM does nothing at all like this normally, nor does any LLM system do this. They are called with a specific prompt to do a task, and they do it. They do not simply sample random facts and speculatively roll out some inner-monologues about the facts to see if they can think of anything ‘interesting’.
But it wouldn’t be hard to do my proposed algorithm. For example, retrieval of random sets of datapoints from a vector database, then roll out a “brainstorm” prompt, then a judgment. Hypothetical prompts:
[SYSTEM] You are a creative synthesizer. Your task is to find deep, non-obvious, and potentially groundbreaking connections between the two following concepts. Do not state the obvious. Generate a hypothesis, a novel analogy, a potential research question, or a creative synthesis. Be speculative but ground your reasoning. Concept 1: {Chunk A} Concept 2: {Chunk B} Think step-by-step to explore potential connections: #. Are these concepts analogous in some abstract way? #. Could one concept be a metaphor for the other? #. Do they represent a similar problem or solution in different domains? #. Could they be combined to create a new idea or solve a problem? #. What revealing contradiction or tension exists between them? Synthesize your most interesting finding below. [ASSISTANT] ... [SYSTEM] You are a discerning critic. Evaluate the following hypothesis on a scale of 1--10 for each of the following criteria: - Novelty: Is this idea surprising and non-obvious? (1=obvious, 10=paradigm-shifting) - Coherence: Is the reasoning logical and well-formed? (1=nonsense, 10=rigorous) - Usefulness: Could this idea lead to a testable hypothesis, a new product, or a solution to a problem? (1=useless, 10=highly applicable) Hypothesis: {Synthesizer Output} Provide your scores and a brief justification. [ASSISTANT]

Obstacles and Open Questions

…Just expensive. We could ballpark it as <20:1 based on the human example, as an upper bound, which would have severe implications for LLM-based research—a good LLM solution might be 2 OOMs more expensive than the LLM itself per task. Obvious optimizations like load shifting to the cheapest electricity region or running batch jobs can reduce the cost, but not by that much.
Cheap, good, fast: pick 2. So LLMs may gain a lot of their economic efficiency over humans by making a severe tradeoff, in avoiding generating novelty or being long-duration agents. And if this is the case, few users will want to pay 20× more for their LLM uses, just because once in a while there may be a novel insight.
This will be especially true if there is no way to narrow down the retrieved facts to ‘just’ the user-relevant ones to save compute; it may be that the most far-flung and low-prior connections are the important ones, and so there is no easy way to improve, no matter how annoyed the user is at receiving random puns or interesting facts about the CIA faking vampire attacks.

Gwern's Implications

Only power-users, researchers, or autonomous agents will want to pay the ‘daydreaming tax’ (either in the form of higher upfront capital cost of training, or in paying for online daydreaming to specialize to the current problem for the asymptotic scaling improvements, see AI researcher Andy Jones 2021).
So this might become a major form of RL scaling, with billions of dollars of compute going into ‘daydreaming AIs’, to avoid the “data wall” and create proprietary training data for the next generation of small cheap LLMs. (And it is those which are served directly to most paying users, with the most expensive tiers reserved for the most valuable purposes, like R&D.) These daydreams serve as an interesting moat against naive data distillation from API transcripts and cheap cloning of frontier models—that kind of distillation works only for things that you know to ask about, but the point here is that you don’t know what to ask about. (And if you did, it wouldn’t be important to use any API, either.)
Given RL scaling laws and rising capital investments, it may be that LLMs will need to become slow & expensive so they can be fast & cheap.

My Takeaways

If we grant the assumption that something like an insight-generator like the default mode network is necessary for AI research to be automated/jobs to be replaced by AI, I think there are a couple of very important implications that follow:

Number 1 is that the most capable AIs will be used internally, and by default we should expect pretty large divergences in capability between consumer AIs and in-house/researcher AIs, because only AI companies would want to spend the expense to get insights from AIs semi-reliably, and as Gwern said, it's an excellent moat for profit/something that differentiates them from competitors.

This means that takeoff will be far more local and internal to the company than people like Robin Hanson/Nora Belrose thought, and the one big model will likely win out over many smaller models that can't run a default mode network.

This also means that AI governance is going to be far more difficult, because even if AIs creating insights that led to automating AI research jobs that then lead to automating everything else, the consumer won't see this/average companies won't use this and instead use the optimized for cheap/fast with the good solutions already expensively made by AI companies.

This will make any discourse around AI very bad, because lots of people will keep arguing that AI can't make new insights/actually be AGI, even if that has in fact happened, and with poor enough transparency/explanation, this could easily keep the government in the dark, meaning AI governance never gets off the ground.

We also should expect a lot more internal tech development than external tech development by default, meaning that the more classic AI/human takeover stories become more plausible (especially so if we add in Drexlerian nanotech).

Number 2 is that this should increase your probability that something like General Purpose Search is going to exist in AIs, because internal models will pay more of a compute tax to already generate insights for AI companies, so using more compute to run a General Purpose Search is already baked in.

Also means that mesa-optimizers/AIs in general will likely be more coherent/persistent than you might think.

Number 3 is that this makes the takeoff slower, because AIs will have to be more expensive in order to become cheap, and you can't get to the cheap/ultra-efficient ASI right at the start, you have to pay more compute to actually get to the cheap and fast ASIs of the future.

Number 4 is that the parallelizability of the insight generator means that it's much more feasible for AI to speed up the process of getting insights than you may initially expect, and in domains which are strongly bottlenecked by insight, this can mean unusually fast progress can happen in these domains.

How strongly you believe scientific fields are bottlenecked by insights relative to other variables like experimentation will determine a lot about how fast AI in general makes progress.

Number 5 is also related to the parallelizability point, which is that it's probably feasible for an AI to have more insights in a faster period than humans do, which makes progress even faster in areas where the strongest bottleneck is insight/simulation.

But that's just my thoughts on the matter, and I'd like to see more discussion of this point.

Discuss

Continual Learning:

Continual Thinking:

Hypothesis: Day-Dreaming Loop

Day-Dreaming Loop

LLM Analogy

Obstacles and Open Questions

Gwern's Implications

My Takeaways

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签