What was so great about Move 37?

Published on May 29, 2025 7:00 AM GMT

I frequently use "Move 37" as a shorthand for "AI that comes up with creative, highly effective ideas that no human would ever consider." Often the implication is that reinforcement learning (as used in AlphaGo) has some "secret sauce" that could never be replicated by imitation learning.

But I realize that I don't know the details of Move 37 very well, other than secondhand accounts from Go experts of how "groundbreaking" it was. I've never played Go, and I have basically no knowledge of the rules or strategies beyond the most basic descriptions. Considering how influential Move 37 is on my views about AI, it seems like I'd better try to understand what was so special about it.

I'd be interested in an explanation that builds up the necessary understanding from the ground up. This could look like: "Read this tutorial on the rules of Go, study these wiki pages about specific concepts and strategies, look at these example games, and finally read my explanation of Move 37 which uses everything you've learned."

Extremely ambitiously, after reading this explanation, I'd be able to look at a series of superficially similar Go boards, distinguish whether it might be a good idea to do a Move-37-like play, identify where exactly to move if so, and explain my answer. That may be unrealistic to achieve in a short time, but I'd be interested in getting as close as possible. An easier version of that challenge would use heavily-annotated Go boards that abstract away some parts of the necessary cognition, with notes like "this section of the board is very important to control" or "this piece has property A" or "these pieces are in formation B."^[1]

If part of the explanation is "when you do an extensive Monte Carlo Tree Search from this board state guided by XYZ heuristics, Move 37 turns out to be the best move," that seems like a pretty good explanation to me—as long as the search tree is small enough that it plausibly could have been explored by AlphaGo during its match with Lee Sedol. I'm mainly interested in trying to understand the intuition behind Move 37 in the way AlphaGo might have "understood" it. If the move couldn't be found by a human without using brute force search, that would be valuable to know.

I'm particularly interested in an explanation of Move 37 because I want to know whether such an explanation is even possible. When we have superintelligent AI solving real-world problems using strategies that no human would ever think of, those strategies should ideally be explainable, if not in practice, at least in principle—perhaps even to the point that a human could understand and replicate the strategies given enough time to study the explanation.^[2]

Lee Sedol spent tens of thousands of hours studying Go, yet even he was flummoxed by Move 37 when he first saw it, spending nearly 15 minutes to come up with a response. Maybe it's hubris to hope that a complete novice like me could understand anything about it, but I'd be surprised if it weren't possible to get some intuition for why this move was important. I'm sure it's very difficult to become an expert in quantum computing, and even harder to discover it from scratch, but it's possible to get a (vague, no doubt flawed) understanding of Grover's algorithm from a 30-minute YouTube video. I generally expect the curve of understanding vs. effort spent to be relatively smooth, even in very difficult domains.

I think it's plausible that requiring AI strategies to meet some minimum bar for explainability won't necessarily incur a huge safety tax. So far, it seems like most AI-discovered strategies are not incomprehensible to humans, given a proper explanation.^[3] Move 37 is the closest thing we have to a counterexample—a strategy that initially seemed alien even to top human experts—so learning more about it would help me evaluate this hypothesis.

I'd be willing to pay for a thorough written explanation of Move 37—likely $50, maybe up to $100 for an extremely high-quality explanation. I'd be willing to spend up to 8 hours studying, but ideally, the explanation would be accessible enough for a random LessWrong reader to glean something useful from it in 30 minutes.

Regardless of whether I can successfully understand Move 37 at a low level, I'd be interested in answering high-level questions like the following:

Lee Sedol's Wikipedia page

What if you had to figure it out without an explanation, just by studying the game?

^[4]

^{^}
I don't necessarily want to actually take a test like this since it seems like it would be hard to make, but I hope this description gives you a better idea of what I'm going for.
^{^}
At this point, I started writing a footnote about two different types of explanations we might try to elicit from the AI. I ended up turning the footnote into a full post: Procedural vs. Causal Understanding.
^{^}
After some research, I found some more examples of "creative AI behavior" that are pretty similar to Move 37, involving novel solutions that no human had previously thought of. However, these examples have important differences, or are so similar to Move 37 that I don't think learning about them would teach me much more (e.g. novel chess strategies found by AI).
AlphaFold's ability to predict protein folding is probably the best example of AI intuitions totally outstripping humans. However, it seems pretty different from AlphaGo in that there are no "expert human protein-folding predictors." It's plausible to me that humans who studied protein folding as diligently as Lee Sedol studied Go, learning from centuries of accumulated human knowledge, would be able to compete with AlphaFold. Even if AlphaFold beat these hypothetical humans, there likely exists some explanation that would let them understand the AI's solutions.
Other AI-discovered strategies are likely pretty easy to understand.
AlphaEvolve is a very recent example of AI coming up with new solutions to mathematical problems. However, AlphaEvolve's edge over human mathematicians seems to come from working at various solutions for a very long time, rather than some special insight that only an AI could have. AlphaEvolve simply uses Gemini 2.0 to generate many variations of high-scoring solutions, without doing any specialized RL training. Since Gemini 2.0's training most likely doesn't involve any multi-step RL, the explanations for its solutions are probably entirely comprehensible to humans.
OpenAI Five, a 2018 AI that played Dota 2, "deviated from current playstyle⁠ in a few areas, such as giving support⁠ heroes (which usually do not take priority for resources) lots of early experience and gold." I'm not sure what other strategies it used, but the single mentioned strategy seems very straightforward to understand.
From The Verge in 2019, reporting on AlphaStar, a pretty similar AI that plays StarCraft:
“AlphaStar is an intriguing and unorthodox player — one with the reflexes and speed of the best pros but strategies and a style that are entirely its own. The way AlphaStar was trained, with agents competing against each other in a league, has resulted in gameplay that’s unimaginably unusual; it really makes you question how much of StarCraft’s diverse possibilities pro players have really explored,” Diego “Kelazhur” Schwimer, a pro player for team Panda Global, said in a statement. “Though some of AlphaStar’s strategies may at first seem strange, I can’t help but wonder if combining all the different play styles it demonstrated could actually be the best way to play the game.”
This explanation of AlphaStar's strategies is even more vague than the OpenAI Five explanation, though it sounds intriguing. If OpenAI Five or AlphaStar ever did come up with any truly incomprehensible superhuman strategies, it's probably very difficult to find out now.
^{^}
Apparently, chess grandmasters were able to learn some strategies from AlphaZero.

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签