Published on July 30, 2025 4:56 PM GMT
Intro/Related/Note
Alt title/TL;DR:
Explode/Exploit Dimensions Diverge to Chase Value and Solve Equilibria in an Economic Model of Logical Time
Related:
In Logical Time, All Games are Iterated Games
Why the tails come apart
Positive Bias: Look Into the Dark
Note:
If you're an alignment-y person, and you enjoy this post and want to know more, I encourage you to contact me for additional information; see bottom of post.
1— Loopy Talent Chicken and Speeding into a Correlated/Dependency Equilibrium
Loopy Talent Chicken
"To the extent that agents are trying to predict the future, they can be thought of as trying to place themselves later in logical time than the events which they're trying to predict. Two agents trying to predict each other are competing to see who can be later in logical time. This is not necessarily wise; in games like chicken, there is a sense in which you want to be earlier in logical time."
Imagine a young person, seeking to be prudent, looks up the "success rate" for people in their position- say, people who want to be artists- and finds that it's 10%.
"Wow! Only 10%," they think- and 9 times out of 10, they give up the ghost, believing they're unlikely to have the talent for art.
They're making an "objective" decision- but it's under the assumption that the domain isn't already loopy, isn’t dependent on winning in a game of chicken. Choosing “do art” early can start generating evidence justifying the choice, and choosing “don’t do art” leads to thinking it was correct to base one’s choice off of the initial statistic. Then, because of the decision algo they enacted, the statistics ends up being “objectively” 90/10.
Speeding into a Correlated Equilibrium
Similarly, some product X might be "objectively" the best option- because the hypothesis, when taken as true, generates evidence for itself. Is HDDVD or Blu-ray better? More people randomly assuming one or the other is better can make it true (because you only really need one, and so if either becomes the standard, you're content with it).
Basically, while the winner isn’t yet established, it’s a situation where any one of the possible mutually-exclusive correlated/dependency equilibria can be selected into based on initial perturbation in the seed RNG.
Sidenote/Definition: Correlated equilibria
"The most straightforward way to understand this, which is offered by Aumann himself (1987, 3ff.), is the following: Somehow, Ann and Bob agree on a joint distribution over the strategy combinations of outcomes of their game. One combination is chosen at random according to this distribution, and each player is told only his or her part of the combination. If no player can raise his or her expected utility by breaking his or her part of the agreed joint distribution and choosing some other pure or mixed strategy instead, then this joint distribution is a correlated equilibrium. Thus, correlated equilibria are self-enforcing, they do not need external help from sanctions or agreements." - Spohn/Aumann
HDDVD/Blu-ray is a nice “clean” example where it feels like there’s basically no difference, but the same sort of effect can go for even ordinary products that seem to be materially distinct in a way HDDVD/Blu-ray aren’t- eg. economies of scale can mean the early pick gets enough momentum to become the objective winner.
2— With Random-Direction Positive Bias, the Tails Come Apart
Optimistic Prediction Pumping
Let’s say you can be diagnosed as being either in category A or in category B, and you don’t know which one you belong to yet. Maybe it’s 80% chance you’re in A, and 20% chance you’re in B. Learning that there’s an 80% chance of A and 20% chance of B doesn’t in itself let you do anything to increase expected utility… But you could choose to make the assumption that you’re in A, and instead of a running a balanced strategy, make choices that are better-in-A/worse-in-B to “pump” utility from B timelines to A timelines.
Is this any use, if you can’t change the total amount of utility? Maybe not in itself, but consider the following scenario.
In games like MTG or Hearthstone, you have resources that can be used to get better control of the board/playing field, or which can be used to directly attack the other player’s life points. If you fail to secure victory by direct attack fast enough, then all you did was give the other player time to build up resources/control to the point where your defeat is inevitable. (That is, in this scenario, you’re playing an aggro deck.)
Now, let’s say next turn you know the enemy will have 5 mana, and that in their type of deck, 5 mana will let them play a card that will wipe all your low-toughness units off the board. Playing more conservatively and not summoning more units means that you’ll have more in reserve to keep up the offensive after the wipe.
Does this mean you should always start to play conservatively at that point, “playing around” their answer to your board? No- it might be best to act as if you have no foresight/as if you predicted they wouldn’t have the card in hand. You “pump” value from timeline B where they have the card to timeline A where they don’t, because there’s a certain threshold of value you need to reach in A in order to be able to win. If B is doomed either way, or just if taking B into account hurts A too much, it’s better to “jump to conclusions” and “unjustifiably predict” that they don’t have the card. Wrongly playing around it is a sort of uncanny valley where you're neither random enough to guess right nor seeing through things enough to choose right.
Two dimensions
From In Logical Time, All Games are Iterated Games:
This weirdness seems to only be possible because of the "two dimensional logical time" which exists in this toy model, in which we can vary both proof length and logical strength. One agent has access to arbitrarily long proofs via oracles, and so is "later" in the length dimension; the other has a stronger logic, and so is "later" in the strength dimension.
[...]
So long as an agent eventually settles down to making some reliable pattern of decisions in a situation, there will be relatively young logical inductors which have learned enough to accurately forecast the decisions made by logical-induction agents who reason using much more computational power.
One of the choices/strategies always being correct in the given situation is equivalent to there being “an agent (that) eventually settles down to make a reliable pattern of decisions”- it just happens to be in the form of a static environment rather than an agent-y agent. Furthermore, “accurate forecasting” can be had simply by guessing the answer. “Actually trying to do the work to find the right answer” is like developing a longer and longer proof, and “guessing the environment/teacher’s password” is like getting an answer faster through stronger logic- the cost of using randomness in place of stronger logic being that you lose whatever fraction of your timelines to your random guess being wrong instead of right.
Additional example: Jumping to conclusions as quantum suicide tunneling
You have teleportation powers but will die if you teleport into an object. If you teleport up into air you’re fine, if you teleport underground you die, and you’re placed in a halfway-underground antigravity prison cell so you can’t tell which way is which. You have a spoon you can dig your way out through the walls with, but eventually you’ll run out of food. If the walls are thin enough that you can dig your way out with the spoon, that’s best, but if you know the walls are thick enough that you’ll run out of food before finding out which way is which, your choice of strategy converges to the brainless one of just trying to teleport out without considering the danger (which in turn 50% of the time “converges to” the “strategy” of being omniscient and knowing which way was the right way all along/having somehow been capable of proving which way was upward).
If the upper limit of success isn’t any higher for agents attempting the long proof, and the amount of agents attempting the longer proof is much less than the amount of agents attempting the random proof, then you’d expect the majority of big wins to belong to the random proof group instead of the long proof group, like the tails coming apart.
Selecting for positive bias
The effect is even stronger/made totally distinct if the success ceiling is higher for random-proof agents because long-proof agents always choose to “waste” some amount of energy/time thinking where random-proof agents don’t. If there are competitive niches to hole up in, but you need to beat other people to them, an increasingly competitive environment of that type would increasingly select for agents that race in whatever direction their initial guess is, like positive bias.
3— Market efficiency positive-sum loop-breaking and lead-up/phases (explode-exploit)
Market efficiency free-riding
If niches saturate, racing into niches can become racing into a bloodbath, like opening a 15th coffeeshop on the same street. If holding back at least produces some acceptable middling score, it can be better to do so than to run forward.
Let's look at some proverbs:
“The market is efficient.”
“Free cheese is only found in the mousetrap.”
In general, these can be said to hold. But have you ever heard someone say “the market is efficient, so that wouldn’t work?” in a way that's fully generic?
Consider “if x is good, people would try to do x, so I shouldn’t x”: this assumes that x is oversaturated with agents similar to the agent thinking this- but the agent thinking this just selected out of it!
In a vacuum, it’s reasonable to think “the market is efficient”, but the more people use “the market is efficient” to skip reasoning about the efficiency of a given proposal, the more the market ceases to be efficient. It’s a funny self-undermining effect to conditioning off of market efficiency, a free-rider problem: the market’s efficiency is only upheld by there being people who check whether or not the market is efficient in a given area.
It’s similar to this: Any given random non-mainstream moral or scientific claim is likely to be woo... but if you reject all non-mainstream claims, you can never do better than the current mainstream position, which historically so far always ends up being wrong/incomplete. Saying “only the current scientific consensus is real” is actually anti-scientific/dogmatic. Call it something like “the skeptic’s paradox”: “I reject all non-consensus claims” is inherently anti-skeptic wrt. the consensus.
Effectively, it’s a solution that’s just also a blind shortcut like the niche-chasing positive bias strategy- it says not already knowing the answer means it doesn’t exist.
Even without directly randomly locking in to a choice, it still has positive bias in that- like the talent chicken scenario- it doesn’t realize that no conclusion can be drawn when one locks oneself out of seeing contradicting evidence. And like the talent chicken scenario, it self-reinforces a stuck loop: conditioning off of consensus means that the more time passes without some thing being tried, the more it looks like that thing isn’t worth trying.
Though understanding the idea behind “the market is efficient” is vital to not being a lemming, would-be startups can’t afford to just say “it would’ve already been done if it was worth doing”.
Dimensions as phases
So, there are sort of three strategies or phases: Sprinting for niches, conditioning off of people sprinting for niches to become anti-sprinting (arguably also a type of niche-sprint), and lastly: the startup wiggling its way out to be successful- finding the “free cheese”, playing a positive-sum game instead of zero-sum.
These can be lined up with the length vs. speed dimensions of logical time proof-finding agents: running with initial priors is the speed dimension being dominant, the “condition off the consensus” phase is a sort of happy/unhappy medium, and the startup deliberately trying to solve for the non-obvious positive-sum ‘exit’ to the consensus is the length dimension being dominant.
Take the HDDVD/Blu-ray example: if you were “a startup” that invented either, you solved for the exit of the current equilibrium, and should thereafter move swiftly to occupy (one of the) the revealed new market niche(s). The system moves from a stagnant consensus to having a new speed-favoring frontier: when the market gets shaken up, old niches die and new ones open up (eg. selling equipment to new gold miners).
Logical time appears as a saturation of niches causing different strategies to become favored over time. Consensus formation/the meaningful passage of time happens as a sort of “bioluminescence”- a build-up of visibility from the actions of earlier agents, eg. conditioning off of seeing 15 failed coffeeshops. There’s an explode phase (speed-dominant) that establishes available information which feeds the exploit phase (length-dominant).
Outro/Motivation/Contact
I was driven to write this article by seeing something that connects it to frontier research. As I’m not convinced of accelerationism, it seems unwise to include it in this post. If you’re an alignment person or someone who could give feedback about it or who to potentially reach out to, please feel free to reach out to me, and I can send you a couple sentences summarizing the seeming connection. You can DM me, or maybe better yet, if you wanted to help others decide whether or not to also ask for more info, you could comment below -> I confirm -> you could reply with your assessment… Something like that. If this post doesn't turn up any takers, I might make a follow-up post with a bit more from another angle in 1d3 weeks or so and try again.
If my notions or strategy here seems misguided, please don't hesitate to correct me!
Feel free to ask for any clarification, and thanks for reading!
Discuss