少点错误 06月29日
Feedback wanted: Shortlist of AI safety ideas
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文作者分享了其为AI安全贡献的130个想法,并进行优先排序。作者详细阐述了其构建的优先级框架,筛选出12个关键想法。为了验证这些想法的有效性,作者希望听取公众的反馈,特别是关于哪些想法已经被他人实践或不相关的意见。文章涵盖了知识调查、生产力提升、写作分享等多个方面,旨在推动AI安全研究和实践的发展。

💡作者构建了一个包含130个想法的清单,涵盖了多个AI安全相关的领域,如信息收集、技能提升等,并使用优先级框架对其进行排序。

📊清单中的想法被分为三大类:调查和总结知识、提升领域建设和生产力、写作和分享想法。每个类别下都包含多个具体的行动方案,例如评估预算使用情况、制定应对风险的计划等。

🤔作者希望通过公开分享清单,征集公众对这些想法的反馈,特别是关于哪些想法已经有人在做,或者是不太重要的想法。这有助于作者优化其工作,使其更具针对性和有效性。

Published on June 29, 2025 4:28 AM GMT

Summary:

Background:

Hello everyone! 👋

At the start of year, I was trying to figure out what my next goals should be regarding AI safety. I ended up making a list of 130 small, concrete ideas ranging all kinds of domains. A lot of them are about gathering and summarizing information, skilling up myself, and helping others be more effective.

I needed a rough way to prioritize them, so I built a prioritization framework inspired by 80,000 Hours. I won't go into the details here, but basically it allowed me to filter out the top 12 ideas - one for each month.

In the first half of the year I did the most urgent ones, and now I'm in Q2 of the Eisenhower matrix. The shortlist still has 12 things to do, since I've come up with new ideas during the first half of 2025. However, I need a bit more input to narrow down the next steps.

This is where you come in. I'd like to hear your thoughts on the shortlist and the relative importance of its tasks. Especially valuable would be to know if something is already being done by someone who can get it done - then I can skip it and get closer to the margin.

The shortlist (by category):


Investigating and summarizing knowledge

1. Visualize the distribution of people in research/governance fields; make the safety "portfolio" clearer and more tangible

2. Find out what % of Schmidt Futures or UK AISI's safety-earmarked budget actually gets used

3. Catalogue and evaluate practical plans to deter or influence RSI

4. Investigate and post about how crucial it is to get mainstream media out of the competition/race framing

 

Fieldbuilding and productivity

5. Come up with a response to a friend's argument over lack of agency

6. Talk about evaluations at Chiba University's AI safety workshop

7. Offer productivity tools and coaching to AI safety researchers

8. Do a calculation on whether it's net positive for Reaktor to work on AI safety

 

Writing and sharing thoughts

9. Write a LW/AF post convincing people in safety to take lower salaries

10. Write about how evals might actually push capabilities, as model developers take on new challenges

11. Write an article on how easy to shut down AGI/ASI systems should be

12. Write a LW/AF post about why we should divest half of interpretability 

If you have thoughts or context to share about any of these, that would be much appreciated. Thank you!

A few personal reflections:

Doing the prioritization was a useful exercise in itself: I noticed most of my ideas seemed ok at the time of having them, but were actually pretty bad upon further inspection. Being able to prioritize and narrow them down by 90% was valuable. It even helped me generate better ideas afterwards, since I can easily compare new ideas to the existing ones.

This also resolved another problem of mine: ideas and advice from other people, while useful, often fails to account for my personal situation (strong desire to live in Japan) and unique opportunities (broad network across cultural and language borders).

Putting numbers to all the ideas made it easier to understand and articulate what I value and am motivated by. It's also nice to have immediate feedback about an idea in an unambiguous form, and I like being able to forget about all the random details: I can simply add notes to my spreadsheet and refer to them when needed.

However, now I've finished the highest-priority tasks and need to figure out what to do in the latter half of the year. My aim has been to do one thing per month; that seems sustainable with my other commitments. I'm hoping that sharing this publicly will encourage others to seek more feedback about their priorities as well.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI安全 想法清单 优先级排序 反馈征集
相关文章