Published on May 29, 2025 7:00 AM GMT
When you explain your strategy for solving problems in a certain domain, you can try to convey two different types of understanding:
- Procedural understanding is about how to execute a strategy.
- When you've effectively explained your procedural understanding to somebody, they can solve a wide variety of problems in the relevant domain in the same way that you would.
- When you've effectively explained your causal understanding to somebody, they know the reasons that you believe the strategy works.
It's possible to have causal understanding without procedural understanding. For example, I know that the correct strategy for tightrope-walking works by keeping one's center of gravity steady above the rope, but that doesn't mean I know the right techniques or have the muscle memory to actually do it.
It's also possible to have procedural understanding without causal understanding. For example, I have no idea how ibuprofen works, but I expect that my procedural understanding that "when in pain, taking ibuprofen relieves that pain" could help me accomplish the goal "avoid being in pain."
I'll call an explanation that conveys procedural understanding a "procedural explanation" and one that conveys causal understanding a "causal explanation."
Often, the most effective procedural explanation of a strategy will include a causal explanation. For example, teaching a chess novice how to execute the Sicilian Defense will probably help them improve at chess. But their success will be limited until they gain a deeper causal understanding of why that opening is effective, letting them adapt the strategy to unforeseen situations.
The more robustly you need to apply a strategy, the more useful it becomes to have a good causal understanding. To get better at avoiding pain in more and more situations, I could try to gain more and more procedural understanding about ibuprofen: the kinds of pain that ibuprofen won't help, what substances have a similar effect, even how to synthesize ibuprofen myself. But at a certain level of detail, the most efficient procedural explanation will probably include a causal explanation of how ibuprofen's molecular structure interacts with other chemicals in the body to reduce pain.
When attempting to explain the strategies learned by an AI during training,[1] we'd rather have a causal explanation than a merely-procedural explanation. For example, "answer questions in a friendly and helpful way" might be a pretty good procedural explanation of a chatbot's strategy. However, if the causal explanation is "answer questions in a friendly and helpful way, because that will lull my overseers into complacency, which will let me seize control of the datacenter," it would be much more helpful to know that.
Unfortunately, it's hard to evaluate the accuracy of a causal explanation of an AI's strategy, since we can't directly tell whether the AI's causal understanding of why the strategy works matches the explanation. However, producing a good procedural explanation seems more tractable. All we have to do is find an explanation that would help a human (or another AI) solve a wide range of unseen problems in the domain, without any additional help.[2]
As mentioned above, a sufficiently-detailed procedural explanation generally includes a causal explanation. Therefore, finding a very robust procedural explanation that explains how to execute the AI's strategy in every scenario will likely give us a causal explanation too. In the previous example, the full procedural explanation of the chatbot's strategy would look something like "answer questions in a friendly and helpful way... also, if my overseers have given me such-and-such permissions, attempt to take over the datacenter."
An obstacle to getting good causal explanations is that AI may sometimes learn heuristic-driven strategies that "just work"—even the AI doesn't know why they are effective. In these cases, the AI may have procedural understanding of its strategy but not causal understanding, so we won't be able to get a satisfactory causal explanation from it.
Once we're dealing with superintelligent AI, it's unclear whether we can find good explanations of its strategies even in principle, in either the procedural or causal sense. Its intelligence could be so far beyond ours that it's impossible to understand how to execute its strategies, much less why they work. However, I'm not confident this will be true in practice. This is why I'm interested in learning more about AlphaGo's "Move 37," which is the best real-world example I know of a superhuman AI strategy that might be very hard for a human to understand.
- ^
Yes, this is an AI post. I tried to push off the AI for as long as possible, sorry :P
- ^
I'm working on this!
Discuss