Published on July 31, 2025 1:15 PM GMT
TL;DR
I believe that:
- There exists a parallel track of AI research which has been largely ignored by the AI safety community. This agenda aims to implement human-like online learning in ML models, and it is now close to maturity. Keywords: Hierarchical Reasoning Model, Energy-based Model, Test time training.Within 6 months this line of research will produce a small natural-language capable model that will perform at the level of a model like GPT-3, but with improved persistence and effectively no "context limit" since it is constantly learning and updating weights.Further development of this research will produce models that fulfill most of the criteria we associate with "AGI".
Overview
Almost all frontier models today share two major features in their training regime: they are trained offline and out of sequence.
- By offline, I mean that there is a distinct "pretraining" phase, followed by a separate "post-training", "character/safety training", or "fine-tuning" phase, and then finally a "deployment" phase. New iterations of the models must also go through these stages. In almost all cases, deployed models do not receive weight updates when they run and perform "inference".By out of sequence, I mean that models receive random samples from the training set, instead of continuous sequences of tokens, characters, images, or audio samples. This is a necessary result of batched SGD, which attempts to both simulate an ideal gradient descent algorithm and also prevent problems like catastrophic forgetting. As a result, models cannot use past inputs to predict future inputs, only learn a general solution to the task they are being trained on.
These features of training regimes make sense if you believe the classic function approximation or statistical approximation explanation of machine learning. In this story the model is meant to learn some fixed "target distribution" or "target function" by sampling i.i.d. data points from the training set. The model is then tested on a holdout "test set" which contains new input-output pairs from the same target distribution or function. If the model generalises across the train set and test set, it is considered a good model of the target.
For many reasons, this story makes no sense when applied to the idea of AGI or trying to develop an ML model that is good at navigating the real world. Humans do not spend the first few years of their lives in a sensory deprivation tank, getting random webcam footage from different places on earth before they stop learning forever and are "deployed" into reality. Furthermore, if your plan is to learn the fixed distribution of all possible english sentences, you will naturally need a representative sample of... all possible english sentences. This is not how humans acquire language skills either, and explains why current ML approaches to natural language generation are becoming prohibitively expensive.
Most of us would agree that humans learn continuously, meaning that they learn online and in sequence. Instead of seeing a wide "context" made up of randomly sampled data from all across the internet, we have a very narrow "context" focused on the here and now. To make up for this, we are able to leverage our memories of the immediate and distant past to predict the future. In effect we live in one continuous learning "episode" that lasts from the moment we are born to the moment we die. Naturally, AI researchers have tried to find ways to replicate this in ML models.
The Agenda I am Worried About
I think that the AI safety community has seriously overindexed on LLMs and ChatGPT-style model safety. This is a reasonable choice, because LLMs are a novel, truly impressive, and promising line of AI development. However, in my opinion research into online in-sequence learning has the potential to lead to human-level AGI much more quickly. A single human brain has the energy demands of a lightbulb, instead of the energy demands for all the humans in Wyoming.
I am not alone in this belief. Research into online in-sequence learning has focused around small model, RNN-like approaches which do not use backpropagation through time. Instead, models must update their weights online and generalise based on only one input at a time, forcing them to learn how to leverage their memory/hidden state to predict future data points if they wish to be effective. By contrast, transformers are explicitly encouraged to memorise surface-level patterns to blindly apply to large blocks of context, instead of internalising the content and using it to predict future inputs.
Some notable papers applying this research include the Hierarchical Reasoning Model, Energy-based Models, ARC-AGI without pretraining, and Test Time Training. Some of these techniques (like Test Time Training or Energy-based Models) augment existing transformer architectures, while others represent entirely novel architectures like the ARC-AGI with no pretraining model and the Hierarchical Reasoning Model. These models share the same idea of getting more information out of each data point than a single backpropagation pass can extract. For example, Test Time Training uses a neural network as its hidden state. It also has an internal update step where key information contained in any incoming data point is compressed into the weights of the hidden state network. ARC-AGI without pretraining trains a new network on each data point (a single puzzle in the ARC-AGI corpus), again aiming to get some compressed representation of the key structural information contained in that puzzle. The Hierarchical Reasoning Model and Energy-based Model iterate on their internal representations either for some fixed number of cycles or until some convergence threshold is reached. That way they can extract maximum information from each data point and give themselves more "thinking time" compared to transformers, which must output the next token immediately after one forward pass. The Hierarchical Reasoning Model also uses higher and lower level recurrent modules to separate cognition into low level execution steps and high level planning steps.
So far, research within this track has produced strong/claimed-to-be-SOTA results for ARC-AGI 1, ARC-AGI 2 (along with Sudoku and Maze-solving), and long-context video generation. These models excel at tasks current frontier LLMs or reasoning models struggle at, or radically improve the otherwise lacklustre performance of standard LLMs. Despite differences in implementation, models in this line of research learn online, take less data to train, have smaller parameter counts, and have better bootstrapping performance (generalising based on a limited number of data points). Many of them also claim to be inspired by brain-like architectures or how humans learn. I think that of the approaches I have listed above the Hierarchical Reasoning Model is the most promising candidate to come out of this line of research so far.
Concrete Predictions
I believe that within 6 months this line of research will produce a small natural-language capable model that will perform at the level of a model like GPT-3, but with improved persistence and effectively no "context limit" since it is constantly learning and updating weights. It is likely that this model will not come from an existing major frontier lab, but rather a smaller lab focused on this line of research like Sapient (who developed the Hierarchical Reasoning Model). The simplest case would be something like "We have adapted the HRM for natural language tasks and scaled it up, and it just works".
I believe that further development of this research will produce models that fulfill most of the criteria we associate with "AGI". In general I define this as a model that learns continuously and online from new data, generalises efficiently to new domains while avoiding catastrophic forgetting, and is skilled in a wide variety of tasks associated with human intelligence: natural language generation and understanding, pattern matching, problem solving, planning, playing games, scientific research, narrative writing etc.
Concretely, these are the developments I am predicting within the next six months (i.e. before Feb 1st 2026):
- I expect a new model to be released, one which does not rely on adapting pretrained transformersIt will be inspired by the line of research I have outlined above, or a direct continuation of one of the listed architecturesIt will have language capabilities equal to or surpassing GPT-4It will have a smaller parameter count (by 1-2+ OOMs) compared to GPT-4
Bonus points:
- It will not be from a major lab (OpenAI, Google, Anthropic, Facebook)It will feature continuous learning prominently as a selling point
What I think we should do
- Move some resources away from LLM centric safety efforts and investigate these new architecturesExamine the possibility of aligning continuous learning models or facilitating value transfer
- What does it mean for a model that is constantly updating its weights to be "aligned" or "safe"?Is there overlap between how continuous learning/learning in general might work in models vs. humans? (This is my current research project, if you have ideas please reach out)
- For example, one approach I find promising might be to train/align a continuous learning model by interacting with it instead of using a fixed training corpus, like how we raise humans. If these models can learn at a near-human rate with human levels of training data, this becomes a possibility.
Discuss