Published on May 20, 2025 11:35 PM GMT
Neuralese recurrence, allowing a model to learn thousands of tokens of information from a single pass, is thought to be the key to building high-capability models. Unfortunately, this key lacks interpretability, ensuring that we cannot check the alignment of the model by looking at its thoughts.
Back in April 3 a user lacking formal computer science education proposed an idea of a non-neuralese architecture something along the line "maybe to solve this information bottleneck problem, we should solve the generate-many-tokens-in-one-pass problem." Unfortunately, he didn't propose any technical solutions. With this post I aim to propose the solution with which I came up.
There is a technique of word embedding which represents words as vectors and even allows one to draw sense-faithful conclusions like "queen ≈ king-man+woman". The most primitive such representations have dimension 100-300, which allows one to define a word by using only 300 neurons. Meanwhile, a layer consists of tens of thousands of neurons. Consider a setup where the first 300 neurons of each layer generate a word to be put into the memo which will have length equal to the number of layers, while the model is to train to produce the memo that will resemble a text on a subject similar to the problem. Then the memo might be interpreted, paraphrased[1] to avoid euphemisms and attempts to communicate in an alien language[2] and loaded into the model along with the CoT for the next step.
Alternatively, one could use the approach where the words are generated by many groups of neurons and placed into quasi-CoTs; then the model should be given access to all the quasi-CoTs, trained to make them coherent and have the option to purge them when the time comes.
- ^
GPT-4o estimates the amount of compute necessary to paraphrase a 100-token memo by GPT-2 as ~5 billion floating point operations. The neural network with many parameters needs far more work than the paraphraser.
- ^
Although the model may develop a foreign language, it will be immune to the paraphraser and easier to understand given access to the memos and their context.
Discuss