What LLMs lack

Published on May 28, 2025 4:19 PM GMT

Introduction

I have long been very interested in the limitations of LLMs because understanding them seems to be the most important step to getting timelines right.

Right now there seems to be great uncertainty about timelines, with very short timelines becoming plausible, but also staying hotly contested.

This led me to revisit LLM limitations and I think I noticed a pattern that somehow escaped me before.

Limitations

To recap, these seem to be the most salient limitations or relative cognitive weaknesses of current models:

System 2 thinking: Planning, see the ongoing weird difficulty to get it to play TicTacToe perfectly or block world, chess, anything that has not been subject of a lot of reasoning RL.

Dealing with new situations: Going out of distribution is a killer for all things DL.

Knowledge integration: Models don't have automatic "access" to skills learned from separate modalities. Even within the same modality skills are not robustly recallable, hence the need for prompting. Also related: Dwarkesh's question.

Learning while problem solving: Weights are frozen and there is no way to slowly build up a representation of a complex problem if the representations that have already been learned are not very close already. This is basically knowledge integration during inference.

Memory: RAGs are a hack. There is no obvious way to feed complex representations back into the model, mostly because these aren't built in the first place - the state of a transformer is spread over all the token and attention values, so recomputing those based on the underlying text is the go-to solution.

Objectivity: See hallucinations. But also self-other/fact-fantasy distinction more generally.

Agency: Unexpectedly we got very smart models that are not very good at getting stuff done.

Cognitive control: The inability to completely ignore irrelevant information or conversely set certain tenets absolute leads to jailbreaks, persistent trick question failures and is also a big part of the unreliability of models.

One category

These seem like a mixed bag of quite different things, but I recently realised that they all belong to the same class of cognitive abilities: These are all abilities that in humans are enabled by and in fact require consciousness.

Is "cognitive abilities enabled by consciousness" maybe a bit tautological? Unconscious people show little cognitive ability after all?

But humans can do many cognitively demanding things without being conscious of them at that moment. The simplest example is driving a well known route and arriving without any memory of the drive, which probably happened to most of us.

Not having a memory of them is a tell, that we weren't conscious of the drive, but probably attending consciously to something else, because conscious experience is necessary for memory formation.

Does this make sense?

The IIT or the global workspace theory tell us that consciousness is about information integration. Different sensory information and the results of subconscious processing are integrated into the coherent whole of what we are conscious of. The coherence of our experience tells us that the information is integrated and not just made available.

Knowledge integration, learning while problem solving and memory are all about integrating information into one coherent whole, while the rest of the limitations touch upon abilities that are based on the manipulation of the integrated information.

Transformers, as they are currently trained, are limited when it comes to information integration for two reasons:

The space into which information is integrated is comparatively small. While the brain subnetworks that holds the information that we are conscious of contain probably at least hundreds of millions of neurons, the final token activation that is used to make a decision aka the next token prediction contains only a couple of thousand (for the largest models possibly a few ten thousand) entries. Information is splintered into tokens. How this is relevant can be seen when we notice that there are cases of the models being able to do impressive information integration during learning: Already GPT-2 was able to translate between major languages despite seeing very little translation data. This is possible because languages all split into comparable tokens and so models can learn to use shared representation spaces of these tokens (Think very similar representations of "dog", "chien", "hund" etc). This breaks down for different modalities where tokens don't naturally share representation spaces, which is why while models might beat you at chess, they cannot teach you chess, because what they say about a game is mostly nonsense.

The correspondence between "stuff LLMs tend to be comparatively bad at" and "stuff humans need conscious processing for" therefore seems to make sense based on the transformer architecture + data + training. (For what it's worth, I don't think state-space-models come out much ahead here, because they are also trained on next token prediction and integrate into a comparatively tiny vector.)

Conclusion

To my mind this satisfyingly delineates the dimensions along which LLMs are still lagging from those where they forge ahead. I don't think this is a very actionable insight, neither in terms of achieving AGI nor in terms of getting a clearer picture of timelines.

However it does make it clearer to me that there really is a qualitative algorithmic gap to AGI and it also convinces me that LLMs are probably not (very) conscious.

Discuss

Introduction

Limitations

One category

Does this make sense?

Conclusion

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签