An attempt to fill some of the gaps…
A. Having AIs more agentic than humans is itself dangerous. In theory, they only need to be slightly better performing than humans to take out most jobs. If they seriously advantage a part of the economy or a particular country, they might lead competitors to implement their own AIs. Thus, jobs might be rapidly replaced and cause serious social disruption.
B. “Small differences in utility functions may not be catastrophic.” Humans might not know or might not be able to articulate the full extent of their values, hence why symbolism carries meaning that language cannot express. Imprecise values might be further distorted when attempting to communicate them to the AI. Further, if human values exist on a range, where in the range the AI falls might be highly consequential. If it adopts values close to those who created it, an AI might ignore the values of a large part of humanity, hence accentuating existing power differences. This can occur while the AI still falls within the “range of human values.” It seems like a properly aligned AI would need to meet universal humans values (if such values even exist) to avoid disrupting power dynamics.
B. “Differences between AI and human values may be small.” It is likely that unlike the learning of facial recognition, the learning of human values is derived from emotions or mental states that are not directly observable. Presumably, the AI would need to learn values from examples of behaviours, but these might reflect wrong values, human errors, or accidents. Values are a form of self-correction; they are better than the behaviours themselves, so the AI that observes behaviours might learn a watered-down version of human values. Further, it is much harder to test the AI’s understanding of values than it is to test the accuracy of its facial recognition capabilities. That is probably why it makes no more errors in generating faces.
B. “Short-term goals.” As the AI becomes increasingly more powerful than humans, the time range in which operate is less and less relevant because it progresses at an exponentially higher rate. That is, what it can do in one year is much more than what we can do in the same time period. As such, destructive outcomes could happen in very short periods of time.
C. “Headroom.” The AI does not need much headroom to create great disruptions. If an AI is only slightly better at business than are humans, then it can out-compete most of them almost at once. If it can devise a slightly better military tactic, it can cause the defeat another nation. An increase in production can also have devastating effects, as it changes the social order. These things don’t look like existential threats, but they can degenerate into those, much like the Industrial Revolution caused war over political systems. Of course, a war now would be more dangerous than a war then.
C. “Intelligence may not be an overwhelming advantage.” Maybe, but maybe cutting-edge intelligence matters the most. Studies usually only measure up to IQ 130, but it’s likely that most very high-earners, who earn a disproportionate amount of wealth power, have IQs above 130. It might be that an AI who is slightly more intelligent than the most intelligent human would churn out many new discoveries, some of which would be disruptive, and dislodge human power. Thus, average-level human intelligence would be a good limit for the intelligence granted to AIs with general capabilities.
C. “Unclear that many goals realistically incentivise taking over the universe.” Humans are not concerned about taking over the universe because it is both impossible and undesirable. That said, some humans, world leaders, are in a position to destroy humanity almost completely, as they have access to nuclear arsenals. So, it’s likely quite easy for an AI to influence these world leaders or to hack a nuclear radar. It seems like an AI would indeed need a very ambitious goal to decide to take the universe, but a much simpler goal might warrant destroying humanity.
C. “Key concepts are vague.” If these concepts are vague, then how will we communicate them to an AI? How will we measure what harm can be done? It seems like AI can only be safe once those concepts are clear; that they are not is an argument for why we should worry. Perhaps humans have been bad at identifying the nature of problems, but they are good at identifying that there is a problem. For instance, it turned out that the “population bomb” was going to solve itself, but some countries took measures to reduce the population, and these measures are leading to a population collapse. This is nearly as bad as the problem they aimed to solve. It seems like anything other than stability is inherently dangerous, however we react.