AI safety content you could create

Published on January 6, 2025 3:35 PM GMT

This is a (slightly chaotic and scrappy) list of gaps in AI safety literature that I think would be useful/interesting to exist. I’ve broken it down into sections:

AI safety problems beyond alignment:

Case studies of analogous problems:

Plans:

Definitions:

If you think there are articles that exist covering the topics described, please verify the articles you are thinking about do meet the criteria, and then tell me.

Communication of catastrophic AI safety problems outside alignment

I’ve previously written about how alignment is not all you need. And before me, others had written great things on parts of these problems. Friends have written up articles on parts of the economic transition, and specifically the intelligence curse.

Few people appear to be working on these problems, despite them seeming extremely important and neglected - and plausibly tractable? This might be because:

there is little understanding of these problems in the community;the problems don’t match the existing community’s skills and experiences;few people have started, so there aren’t obvious tractable in-roads to these problems; andthere aren’t organisations / structures for people to fit in to work on these problems

Tackling the first two issues might be done by spreading these messages more clearly. Corresponding semi-defined audiences:

The existing community. Funders or decision makers at AI policy orgs might be particularly useful.People who would be useful to add to the community. These might be experts who could help by working on these problems, or at least beginning to think about them (e.g. economists, politics and international relations scholars, military/field strategists). I suspect there are many things where people from these fields will see obvious things we are missing.

There is downside risk here. We want to be particularly careful not heating up race dynamics further, particularly in messaging to the general public or people likely to make race-y decisions. For this reason I’m more excited about spreading messages about the coordination problem and economic transition problem, than the power distribution problem (see my problem naming for more context).

Case studies for analogous problems

Related to the plans above, I think we could probably get a lot of insight into building better plans by looking at other case studies through time.

Unfortunately, a lot of existing resources on case studies are:

sharing with policymakers

epistemic legibility

I think people should be flexible as to what they look into here (provided you expect it has relevance to the key problems in AI safety). Some questions I brainstormed were:

the power distribution problem

the power distribution and economic transition problem

A lot of existing resources only describe surface level actions, but not the ‘hard’ (IMO) part of the problem. For example, explaining that Norway taxed oil, invested those funds and promoted education - but how did they align the incentives to begin with? (Or resources point at having strong democratic institutions - but how can we speedrun building these before AI?)Highlighting where there are disanalogies might also be helpful: e.g. with oil people could still earn the country meaningfully more money while working (because they were already educated), but with AI that might not be the case. Also some claim that the oil took much longer to extract than other countries with oil, so they needed to build lots of infrastructure first and therefore had a more gradual transition.

the coordination problem

Human cloning doesn’t have as obvious rewards, so this might be a bad analogy. But perhaps there are other promising technologies we managed to control?

the purpose problem

Plans for AI safety

For the last few weeks, I’ve been working on trying to find plans for AI safety. They should cover the whole problem, including the major hurdles after intent alignment. Unfortunately, this has not gone well - my rough conclusion is that there aren’t any very clear and well publicised plans (or even very plausible stories) for making this go well. (More context on some of this work can be found in BlueDot Impact’s AI safety strategist job posting).

In short: what series of actions might get us to a state of existential security, and ideally at a more granular level than ‘pause’ or ‘regulate companies’.

Things that are kind of going in this direction:

Analysis of Global AI Governance Strategies

What success looks like

A Narrow Path

The Checklist: What Succeeding at AI Safety Will Involve

A Playbook for AI Risk Reduction (focused on misaligned AI)

What’s the short timeline plan?

However, many of them stop (or are very vague) after preventing misalignment, or don’t describe how we will achieve the intended outcomes (e.g. bringing about a pause successfully). Additionally, while there has been criticism of some of the above plans, there is relatively little consensus building on these plans, or further development to improve the plans from the community.

Building a better plan, or improving on one of these plans (not just criticising where it fails) would be really valuable.

Defining things

I run BlueDot Impact’s AI safety courses. This involves finding resources that explain what AI safety people are talking about.

There are useful concepts that people in AI safety take for granted but there are ~no easy-to-find resources explaining them well. It’d be great to fix that.

I imagine these articles primarily as definitions. You might even want to create them as a LessWrong tag, AISafety.info article, or Arbital page (although I’m not certain Arbital is still maintained?). They could be a good match for a Wikipedia page, although I don’t know if they’re big enough for this.

I think these are slightly less important to write than the other ideas, but might be a good low-stakes entrypoint.

pivotal act

existential security

Catastrophic risks from unsafe AI

iterated amplification

Devin

Cline

Discuss

Communication of catastrophic AI safety problems outside alignment

Case studies for analogous problems

Plans for AI safety

Defining things

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签