The last era of human mistakes

Published on July 24, 2024 9:58 AM GMT

Suppose we had to take moves in a high-stakes chess game, with thousands of lives at stake. We wouldn't just find a good chess player and ask them to play carefully. We would consult a computer. It would be deeply irresponsible to do otherwise. Computers are better than humans at chess, and more reliable.

We'd probably still keep some good chess players in the loop, to try to catch possible computer error. (Similarly we still have pilots for planes, even though the autopilot is often safer.) But by consulting the computer we'd remove the opportunity for humans to make a certain type of high stakes mistake.

A lot of the high stakes decisions people make today don't look like chess, or flying a plane. They happen in domains where computers are much worse than humans.

But that's a contingent fact about our technology level. If we had sufficiently good AI systems, they could catch and prevent significant human errors in whichever domains we wanted them to.

In such a world, I think that they would come to be employed for just about all suitable and important decisions. If some actors didn’t take advice from AI systems, I would expect them to lose power over time to actors who did. And if public institutions were making consequential decisions, I expect that it would (eventually) be seen as deeply irresponsible not to consult computers.

In this world, humans could still be responsible for taking decisions (with advice). And humans might keep closer to sole responsibility for some decisions. Perhaps deciding what, ultimately, is valued. And many less consequential decisions, but still potentially large at the scale of an individual’s life (such as who to marry, where to live, or whether to have children), might be deliberately kept under human control^[1].

Such a world might still collapse. It might face external challenges which were just too difficult. But it would not fail because of anything we would parse as foolish errors.

In many ways I’m not so interested in that era. It feels out of reach. Not that we won’t get there, but that there’s no prospect for us to help the people of that era to navigate it better.

My attention is drawn, instead, to the period before it. This is a time when AI will (I expect) be advancing rapidly. Important decisions may be made in a hurry. And while automation-of-advice will be on the up, it seems like wildly unprecedented situations will be among the hardest things to automate good advice for. We might think of it as the last era of consequential human mistakes^[2].

Can we do anything to help people navigate those? I honestly don’t know. It feels very difficult (given the difficulty at our remove in even identifying the challenges properly). But it doesn’t feel obviously impossible.

What will this era look like?

Perhaps AI progress is blisteringly fast and we move from something like the world of today straight to a world where human mistakes don’t matter. But I doubt it.

On my mainline picture of things, this era — the final one in which human incompetence (and hence human competence) really matters — might look something like this:

… since if one of the basic inputs to the economy is changed, the optimal arrangement of things is probably quite different (cf. the ecosystem of things built on the internet);… but that process hasn’t reached maturity.

Humans employing AI eat up most or all of the free energy of good automated power-seeking strategy/tactics, so this doesn't immediately create an instability where AI actors can amass large amounts of power

Doing the most interesting parts of their jobs and using AI tools to automate a lot of the rest Doing manual labour of various types, with AI providing on the job training and assistance

It may well be better than many humans, but lack of feedback loops mean it's hard to tell, and people's trust falls back a good amount on their priors

That's enough predictions that I'm probably wrong in some of the particulars. But I think the broad brush stroke picture is decently likely.

Central challenges to be borne by humans

What kind of challenges will people actually face at these times?

This is difficult to be particularly confident about. But here are some thoughts:

In addition to humans, it matters what AI systems, and what institutions, we create

Are there e.g. prohibitions on certain types of action? It’s unclear whether there’s a lot of path dependency here

What technologies are available could determine the strategic positionWhich research can be easily automated could determine what the future technological landscape looks likeThis might not have so much influence if there’s some kind of grand bargain between the players

The vibe of the intuition is like “power seeking has good feedback loops, and things with good feedback loops tend to get automated earlier”

cf. translation, artwork today

on a military front

Avoiding the accidental creation of systems with undesired values seems to more or less correspond with AI alignmentAvoiding accidentally ceding power to such systems seems to more or less correspond with AI control

An important component of this is conceptual research, getting clarity on what things would even be automated

Trying to help at far remove

Even if we have some sense of their challenges and desire to help — what can we do? A central difficulty is that, however much we can get a sense of their challenges, their own sense of the challenges will be much better. It is inefficient for us to focus too much on specific scenarios^[3]. A related issue is that they will have better tools than we do — some work we might want to do could by then be automated.

I don't know how to think about this systematically, so I may well be missing things. But for now, there are three strategies which seem to me to have some promise — one about helping the future players to act wisely, and two about helping to get the gameboard in a good position.

First, deepening understanding of foundational matters. Having a good grounding in the basics (both theoretical and empirical) seems like it's helpful for understanding all sorts of situations. We have some disadvantage from distance of not knowing which areas of foundations are most relevant, but the space of possible foundations is much much smaller than the space of possible applications, and we can make some educated guesses. In this case that means analysis of the nature of AI, of the senses in which different actors might have values, of the basic dynamics of game theory or bargaining in cases with partial information and partially defined preferences, and so forth. It seems to me like although we have models of all of these things, our models don't always feel like they're capturing all the important things. I wouldn't be surprised if improvements in these foundations were possible, were helpful, and were counterfactual (through the relevant moments).

Second, power seeking on behalf of values one likes. This can include trying to shape the values of various actors, or trying to empower actors with desirable values. Honestly I'm pretty nervous about this one, because (1) it's so common and human for people to delude themselves into thinking that their values are superior, even when they're not, and (2) society has good memetic immune responses against various types of power seeking, so it can be easy for this to backfire. But it definitely is a strategy which can work at this distance, and it has some types of robustness (it doesn’t rely on second-guessing future actors, but is just about setting the gameboard up well). I feel relatively less worried about versions of this which are focused on fundamental values like cooperativeness and a commitment to moral reflection and truth-seeking, and more worried about versions predicated on particular object-level views about which values are correct.

Third, differential technological development. It seems quite possible that the position people are in will depend in various ways on the state of technologies. Work which facilitates desirable technologically pathways coming sooner relative to less desirable ones seems like a good lever. This can include (as e.g. in the cases of AI alignment and control) work laying the groundwork for future automation of research, including conceptual work helping to inform what things, exactly, are good to automate. Differential technological development, as well as being a strategy in its own right (aiming to positively influence the tech available during the last era of human mistakes), can also be a tactic in service of the two other strategies above — e.g. perhaps differentially advancing research which helps us to think clearly about big novel issues.

What to make of this

Framing in terms of the last era of human mistakes feels to me like it’s capturing some important dynamics (although it may be confused about others). I feel glad to have found the perspective, and to get to interrogate it. It helps to remind me how strange the future will be. And it seems like it provides some seeds which I may later find helpful for my thinking.

At the same time, as of the time of writing I’m not sure how much this perspective will help. It shifts my view of things, but it doesn’t make it very transparent what to do. Still, I felt like there was enough here to be worth sharing. If other people find the perspective useful, or not-useful, I’d be interested to hear about that.

^{^}
Or not — there are possible futures where humans are removed from decision loops altogether.
^{^}
I've sometimes heard this period, or something close to it, called “crunch time”. I mildly dislike that name because although it points to the importance of the period it sort of obscures the mechanisms via which it's important.
^{^}
Although it often seems to be very productive to explore specific scenarios, to help keep general thinking grounded.

Discuss

What will this era look like?

Central challenges to be borne by humans

Trying to help at far remove

What to make of this

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签