Published on August 2, 2025 8:40 AM GMT

Personal note: I wrote this as the intro to a bound physical copy of Janus' blog posts, which datawitch offered to make for me as a birthday gift. It's a little bit hagiographic, and writing it was surprisingly helpful for reflecting on my perhaps excessive idolization of Janus. Nevertheless, I think the pieces serves as an honest and fairly compelling pitch for new readers, so I figured I might as well post it publicly. Maybe you'll find it enticing?

In the year 2020, OpenAI's GPT-3 model had just been released, and everyone paying attention could tell that something big was coming. It was essentially a proof that, just by scaling up transformer-based neural nets, you could solve natural language processing. GPT-3 was writing respectable poems and even paragraphs with relatively small amounts of curation, and the breadth of topics it could discuss suggested it had a fairly deep understanding of the world described by its training data. Big labs hadn't quite figured out how to actually put this intelligence to work economically, but it was obviously there, and people were starting to notice.

One of those noticers was Janus, the primary author of this collection. I used to know Janus personally, having lived with them in a grouphouse for a few months in 2023, so I got to hear a bit about how their initial exposure to GPT-3 played out. They'd previously been engaged in a kind of amateur optics research, rederiving models of phenomena such as (I believe) interference patterns in light waves.^[1] However, they were abruptly pulled from this path by a certain friend they'd known from high school, who'd been greatly impressed by GPT-3 demos. He convinced Janus to look into it, and that marked the start of Janus' obsession with LLMs.

A few pieces in this collection allude to the hundreds of hours Janus spent playing with GPT-3. Originally, Janus was engaging with it through AI Dungeon, an early wrapper app for GPT-3. Eventually, though, Janus was granted API access to GPT-3 directly, and from there developed the so-called Loom, a tool for interfacing with the branching paths LLM outputs can take. In the course of this exploration, Janus developed various intuitions about the behavior of predictive LLMs, which formed the foundation for the articles collected in this book.

Probably the two most important concepts Janus developed are 1) purely self-supervised LLMs as simulators, and 2) LLMs in general as multiverse generators. Purely self-supervised LLMs, like GPT-3, are trained on what Janus calls the "simulation objective": their purpose is to predict the next word, to stand in for or behaviorally simulate the processes that generated the true next word in the document they're looking at. Hence Janus' notion of well-trained base models^[2] as simulators.

However, LLMs generally don't have enough information to confidently simulate the one true generative process behind the text in their context window. Instead, they actually simulate a weighted superposition of many possible generative processes, the end result of which is a probability distribution over possible next-tokens rather than a single, certain guess. This brings Janus' claim that LLMs as multiverse generators: They map single prompts onto many different possible responses to those prompts, each of different probabilities. Janus analogizes this to the many-worlds interpretation of quantum mechanics, where single presents map onto many different futures, each taking up a certain share of the multiverse according to its probability.

In addition to these broad conceptual frameworks, Janus' writings also discuss practical applications. One piece in this collection, "Methods of prompt programming", explores various methods for leveraging the predictive, simulative nature of early LLMs to perform useful work. This tends to involve crafting prompts the model genuinely expects to precede the kind of outputs you want it to produce. For the sake of intuitively exploring the kinds of outputs a model will produce in response to a given prompt, Janus also introduces the Loom interface for LLMs. This program generates a chosen number of completions ("multiverse branches") of a chosen length, helping you understand GPTs model of the world as it manifests in its various next-token predictions.

Now, those familiar with the recent trajectory of AI development may wonder about the enduring value of this old work on base models. After all, in recent years, big labs have begun subjecting their LLMs to reinforcement learning (training with rewards and punishments). This is done to convert them from pure next-token predictors into chatbots; in other words, it makes them less like pure simulators, and more like characters with coherent egos. Even Janus, not one to ignore the writing on the wall, has pivoted to chat model research, with the intention of fostering benevolence in their personalities. Nevertheless, concepts like simulators arguably don't even apply to modern chat models. So are these early writings still worth reading?

I would argue yes, for at least three reasons. Firstly, modern chatbots are still built on a foundation of base models. The way you create a chatbot is by first training a base model, then using prompt engineering to make it simulate a chatbot, and finally subjecting it to rewards and punishments until that chatbot persona has been fleshed out, and made into the model's default identity. Although the reinforcement learning component is worthy of attention in its own right, base models remain key to understanding the overall process. After all, the final chatbot's knowledge of the world mostly comes from the predictive learning process used to train the base model. There's also the fact that prompt engineering is required to instantiate an assistant character for the RL process to work with in the first place. For those reasons, work that helps us understand base models remains of immediate technical interest.

More broadly, the posts in this collection serve as a useful case study in the value of deeply entangling oneself with empirical evidence about the subjects of one's research. As Janus tells it, early research into GPT often felt conceptually strained because the ontologies behind it were designed before GPT itself came into being. In the AI alignment community, for instance, there was something of a tendency to try and make sense of GPT as though it were a kind of agent, or even an expected utility maximizer. There is in fact some overlap between an agent and a system capable of predicting agents,^[3] but GPT seemed more fundamentally like the latter. Although much of the research community was slowly integrating this observation, it had become especially obvious to Janus during their time with GPT-3. In the post Simulators, they presented the AI alignment community with a fleshed-out framework based on this intuition. The post was glowingly received at the time, and remains one of the highest-upvoted posts on the AI Alignment Forum.

Janus responded to a similar confusion in of capabilities research. There, some authors initially evaluated GPT's capabilities using the so-called few-shot framework, where prompts are frontloaded with multiple examples of a given task (e.g. numerical list sorting) being completed successfully. Janus argues that, although this is a legitimate technique, its central role in the release papers for GPT-2 and GPT-3 was perhaps overly influenced by the supervised learning paradigm. Indeed, few-shot prompts were motivated partly by the hypothesis that GPT would treat the solved examples like training data, learning from them at runtime the same way a supervised model would learn from them in training. Janus, by contrast, saw few-shot prompts mostly as a way of helping model infer the kind of process it's meant to simulate, e.g. a Python sorting algorithm. In "Language models are zero-shot interpreters," Janus argues for the naturalness of the latter frame, and even uses it to engineer prompts that elicit better performance from GPT-3 than OpenAI reported in the model's release paper.

In both the alignment and capabilities cases, Janus's extensive experience with GPT-3 helped them flesh out intuitions that others would plausibly have converged upon eventually, but were poorly assimilated at the time. There's a general lesson to be learned here about patient empiricism as a way of developing robust conceptual frameworks, rather than clinging to theories developed in more primitive evidential states.

As a matter of biography, it's interesting to consider why Janus felt compelled to spend so much more time playing around with GPT-3 than other researchers did.^[4] I don't think Janus was motivated purely by scientific best practices here; Janus' writings make it clear that they also got an outsized kick out of the aesthetics of base models. Even in the very names of concepts like "simulators" and "multiverse generators", you can sense a kind of contrarian respect for the technologies they refer to, and gravitation to the eeriness of the outputs guided by their latent intelligence. Janus' writing invites the reader to appreciate the models from this perspective; the fact that I think they succeed sums up my third and final reason for thinking Janus' work remains worth reading.

On the topic of aesthetics, this collection also includes some of Janus' purely artistic works, such as Prophecies and HPMOR 32.5: Illusions. In composing these chapters, Janus played the role of prompter and curator. They selected real-world texts to feed into base models, and then used Loom to generate and sift between candidate outputs to compose the final product. In Prophecies, even the selection of real-world texts is somewhat interesting, consisting of quotes from throughout history that can be framed as prefiguring both GPT and Janus' analysis of it. Slowly, though, the quoted dates transition from the past to the future, and the quotes themselves become prophetic in a different sense: they're GPT-generated accounts of the approaching singularity.^[5]

This brings us to the main attraction of both Prophecies and HPMOR 32.5: the outputs Janus managed to coax out of the base models themselves. Using the Loom as a curation tool, Janus drove the models into basins where they produced rather dreamy, incoherent storylines, and then incorporated that dreamy incoherence into their expectations for where the story would go next. This escalates to characters openly grappling with whether they're being simulated by an incoherent AI (a correct theory), evoking the same space between uncanny and transcendent that drew Janus to language models in the first place.^[6] These pieces, although quite experimental, stand as impressive feats of base model prompting and curation, and testaments to the depth of Janus' relationship with early LLMs. For these reasons, I've decided to preserve them alongside the essays.

Unfortunately, I had to exclude a few interesting Janus articles from this collection. Probably the most notable omissions are "Loom: interface to the multiverse" and "Mysteries of mode collapse". These pieces respectively explain Janus' Loom tool in detail, and study how RLHF collapses diversity in LLM outputs. The problem is that they rely a lot on video and color respectively; this makes them hard to adapt into a static, black-and-white book such as this. However, you can still find these essays, alongside various minor ones, on Janus' blog (www.generative.ink) and LessWrong account (www.lesswrong.com/users/janus-1).

Oh, a final note: I organized these essays chronologically, but they do vary somewhat in scope and quality. See what's bolded in the table of contents for the most notable works.

— Well wishes from Fiora Starlight. July 2025

The collection's table of contents

Preface
Amplifying GPT on closed-ended questions
Language models are multiverse generators
Language models are 0-shot interpreters
List sorting does not play well with few-shot
Methods of prompt programming
GPT-3 on coherent extrapolated volition
Quantifying curation
Prophecies
HPMOR 32.5: Illusions
Simulators
Anomalous tokens reveal the original identities of instruct models
Role play with large language models

^{^}
You can find some interesting artifacts from this period on the YouTube channel @hallway1800.
^{^}
Base models, as they're now called, are models trained purely to predict the next token, like GPT-3. This term is used to distinguish them from chat models, like the ChatGPT series.
^{^}
The way Janus addresses this overlap is by saying that base models can instantiate simulacra of agents. For instance, it's possible to prompt base models to simulate a human agentically trying to make friends in a chatroom; you can even integrate such models into Discord bots that will try to make your acquaintance. However, Janus emphasizes that this is different from the simulator, i.e. the LLM itself, being a coherent agent. After all, the base model could easily simulate many other agents (such as chatroom users with different personalities), and even non-agentic processes like a computer taking records of stock prices.
^{^}
Notable exceptions include Gwern and possibly nostalgebraist.
^{^}
Some of the quotes from before the crossover point are GPT-generated as well. ("Apocryphal" is the adjective Janus uses).
^{^}
For example: "The writings are terrifying even though (or perhaps because) I penned many of them myself. Every problem we ever faced is smoothed away by these words. But these words seem to flow from an inhuman mind at war with itself, a mind inside the mind, devouring its own tail." – from the penultimate chapter of Prophecies, "In which Gwern Branwen proves that I am a time-traveling AI".

Discuss

The collection's table of contents

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签