少点错误 02月18日
AGI Safety & Alignment @ Google DeepMind is hiring
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Google DeepMind的AGI安全与对齐团队(ASAT)正在招募研究科学家和研究工程师。该团队专注于通过技术手段应对AGI系统可能造成的严重危害,主要围绕AGI对齐和前沿安全两大主题展开研究。加入该团队,你将有机会在前沿实验室进行安全研究,参与制定并实施AGI安全缓解措施,并影响Google在全球范围内应用的安全方法。团队致力于开发应对潜在风险的缓解措施,并推动前沿安全框架的实施,从而在AGI安全领域发挥重要作用。

🧑‍💻ASAT团队主要关注AGI对齐和前沿安全。AGI对齐侧重于放大监督和机制可解释性,而前沿安全则专注于前沿安全框架的开发和实施。

🛡️GDM的安全团队致力于开发和准备缓解措施,以便Google能够在前沿安全框架(FSF)中公开讨论这些措施,从而帮助建立规范和政策,指导前沿开发者采取哪些缓解措施来应对AGI风险。

🔬ASAT团队利用其拥有的前沿模型和大量计算资源,进行其他实验室难以进行的安全研究。例如,他们正在积极研究监控技术,特别是思维链监控,并将其视为一种近期的AI控制方法。

🗺️团队计划在2025年底前发布GDM的AGI安全路线图,该路线图将超越FSF所涵盖的能力范围,并制定基于缓解措施的安全案例草案,以便在出现危险能力时能够迅速具体化。

Published on February 17, 2025 9:11 PM GMT

The AGI Safety & Alignment Team (ASAT) at Google DeepMind (GDM) is hiring! Please apply to the Research Scientist and Research Engineer roles. Strong software engineers with some ML background should also apply (to the Research Engineer role). Our initial batch of hiring will focus more on hiring engineers, but we expect to continue to use the applications we receive for future hiring this year, which we expect will be more evenly split. Please do apply even if e.g. you’re only available in the later half of this year.

What is ASAT?

ASAT is the primary team at GDM focused on technical approaches to severe harms from AI systems, having evolved out of the Alignment and Scalable Alignment teams. We’re organized around two themes: AGI Alignment (think amplified oversight and mechanistic interpretability) and Frontier Safety (think development and implementation of the Frontier Safety Framework). The leadership team is Anca Dragan, Rohin Shah, Allan Dafoe, and Dave Orr, with Shane Legg as executive sponsor.

Why should you join?

I’d say there are three main ways in which work on the GDM safety team is especially impactful:

    GDM is one of the most likely places to develop AGI, and cares about AGI safety, so it is especially important to implement AGI safety mitigations at GDM.Developing and preparing mitigations enables Google to discuss them publicly in the Frontier Safety Framework (FSF), which in turn can help build norms and policy about what mitigations frontier developers should put in place to address AGI risks. For example, our updated FSF is the first policy to address deceptive alignment.Since we have access to frontier models and substantial amounts of compute, we can do safety research that would be hard to do anywhere other than a frontier lab.

Despite our small size relative to the size of Google, our team is responsible for setting an AGI safety approach that applies at Google’s massive scale (via the FSF). This is a big deal – actions taken by Google will typically have greater policy impact than the same actions taken by smaller frontier labs.

By far our biggest resource bottleneck is people, so new hires should expect to make a significant difference to our impact.

GDM is also a great place to learn and upskill – we’re surrounded by incredibly strong researchers and engineers, both for safety and ML more broadly.

Also, while everyone says this, I really do think our team has a great culture. Team members know the point of the project they’re working on; any team member can raise an objection and they will be listened to. People are incredibly helpful and generous with their time. At least one person who joined us with 10+ years of industry experience finds it the best culture they've been in.

What will we do in the near future?

Half a year ago, we published an overview of our recent research. This should give you a decent sense of the type of work we plan to do in the future as well. The biggest change relative to that post is that we’re planning to work a lot on monitoring, particularly chain-of-thought monitoring, which we think of as a near-term example of AI control.

Here are a few concrete things I hope for the team to accomplish by the end of 2025, to give a sense of what you’d be contributing to:

    Publish a GDM roadmap for AGI safety, that extends beyond the level of capabilities addressed by the FSF (though not all the way to superintelligence). We have already developed a draft roadmap internally that we use for research planning.Develop mitigations and produce a sketch of a mitigation-based safety case for misuse, that can be quickly concretized once a dangerous capability actually arises.Develop more evaluations for instrumental reasoning (e.g. self-reasoning, stealth) and monitors for deceptive alignment (see the FSF), ideally exploring both black box methods and methods based on model internals. Produce a sketch of a monitoring-based safety case for deceptive alignment, that can be quickly concretized once models gain sufficient capabilities at instrumental reasoning.Do a deep dive on externalized reasoning (similar to “CoT faithfulness”) as it relates to monitoring, particularly focusing on how it may break in the future and what can be done to avoid that.Demonstrate that some flavor of debate outperforms strong baselines in a realistic, challenging task. One source of unrealism is allowed: methods may be restricted to use feedback from an artificially weak source to create a sandwiching setup.Develop and be ready to deploy mitigations based on model internals that outperform their behavioral equivalents (which may just be on latency / cost, rather than accuracy).

I doubt we’ll succeed at all of these. Perhaps I’d guess we’ll succeed at 4-5 of them in spirit (i.e. ignoring minor deviations from the letter of what we wrote).

We’ll also do a few other things not on this list. For example, we expect to improve our approach to preparing for the automation of ML R&D, but we don’t yet know what that will look like, so it was hard to write down as concrete an outcome as we did for the other items on the list. And of course there will be new things that we work on that I haven’t yet anticipated.

How do you prioritize across research topics?

Generally, we don’t assign people to work on particular topics. Instead, team members can choose what they work on, as long as they can convince me (Rohin) that their project has a decent theory of change, and they can find enough collaborators that the project will move forward at a reasonable pace. (This is somewhat less true on Frontier Safety, where there is somewhat more assignment of people to particular tasks.)

As a result, there isn’t a clean answer to “how do you prioritize”, since prioritization depends on the expertise, skills and views of individuals on the team, and is effectively based on an implicit aggregation of a variety of views on the team about what work is impactful that is hard to reify.

Nonetheless, I can say a bit about how I personally think about prioritization across high-level projects. As a completely made up number, I’d guess that my views drive roughly 50% of overall prioritization on the team (through a combination of formal authority, convincing team members of my views, and deference).

Roofshots. An important part of my view is that there’s a lot of “simple” or “obvious” work to be done that buys significant safety, where it is primarily important to foresee that the work is needed, and execute well on it. So rather than aiming for research breakthroughs (“moonshots”), I see our job as primarily about executing well at “roofshots”.

(Note that I view work on MONA and debate as a series of roofshots – I’m very much not saying “just do some evals and YOLO it”.)

I expect that if we consistently achieve roofshots, that will in aggregate go beyond what a moonshot would have achieved, in less time than it would take to produce a moonshot. This seems like the default way in which impressive progress happens in most fields (see e.g. Is Science Slowing Down?).

Comparative advantage. My general heuristic is that our research should take advantage of one or both of our two main comparative advantages:

    GDM integration: Areas where it is fairly clear that we will want the research to be integrated into GDM practice at some point. This doesn’t mean it has to be integrated now, but the work should at least be done with an eye towards integration in the future.Lab advantages: Research directions that leverage significant lab advantages, e.g. because they're very compute intensive, require access to the weights of the best models, benefit from confidential knowledge about the research frontier, etc.

I used to have the view that we should just work on whatever seemed most important and not worry too much about the factors above, since we hire some of the most talented people and can do a better job than most other groups. I still believe the latter part – for example, many have tried to explain why grokking happens, but I think our explanation is the best; similarly many investigated unsupervised knowledge discovery as an empirical AGI safety technique, and I think our paper provided the most decision-relevant evidence on the subject (except possibly the ELK report).

However, I’ve changed my mind on the overall view, because there’s quite a lot of important work to be done in the two buckets above, and other work doesn’t look massively more important, such that we really do want to get the gains from trade available by focusing on comparative advantages.

Now, when someone on ASAT wants to do important work that doesn’t fall in one of the two buckets, I’m more likely to recommend an external collaboration or MATS mentoring. Around 10 team members do substantial external mentoring. Over the last year, they’ve supervised ~50 external researchers, producing ~25 papers.

FAQ

Q. Does GDM take AGI safety seriously?

Rather than having to take our word for it, we think there is significant public evidence.

DeepMind was founded with an AGI safety mission. Its leadership endorsed the importance of AGI safety when DeepMind was founded (see posts), and continues to do so (see CAIS statementrecent podcast, and discussion of the AI Action Summit).

(People sometimes suggest that frontier labs invest in AGI safety as a form of safety washing, with upsides like dissuading regulation or attracting EA talent and funding. This hypothesis fails to retrodict the history of DeepMind. DeepMind was founded in 2010, a time when AGI safety was basically just SIAI + FHI, and “effective altruism” hadn’t been coined yet. The founders were interested in AGI safety even then, when it was clearly bad for your prospects to be visibly associated with AGI safety.)

DeepMind has had an AGI safety team since 2016, and has continually supported the team in growing over time. ML researchers are not cheap, and nor is the compute that they use. I’m pretty unsure whether Open Philanthropy has spent more on technical AGI safety than Google has spent on its technical AGI safety team.

I think the more relevant issues are things like “there are many stakeholders and not all of them take AGI safety seriously” or “there are constant pressures and distractions from more immediately pressing things, and so AGI safety is put on a backburner”. These are obviously true to at least some degree, and the question is more about quantitatively how rough the effects are.

One clear piece of evidence here is that Google (not just GDM) has published and updated the Frontier Safety Framework (FSF), with the first version preceding the Seoul AI Commitments. Google is not a startup – it’s not like we just got a quick approval from Demis and Shane, and voila, now the FSF could be published. We did a lot of stakeholder engagement. If GDM didn’t take AGI safety seriously, then (at least prior to the Seoul AI Commitments) the relevant stakeholders would have ignored us and the FSF would not have seen the light of day.

Q. Isn’t GDM incredibly bureaucratic, stifling all productivity?

While there is non-zero truth to this, I think this has been greatly overstated in the safety community. We published an overview of our work over ~1.5 years – you can judge for yourself how that compares to other labs. My sense is that, compared to the other AI labs, our productivity-per-person looks similar or better. Personally, I like our work more, though since I have a lot of influence over what work we do, of course I would say that.

Don’t get me wrong – there is bureaucracy, and sometimes it tries to block things for silly reasons. If it’s important, we escalate to get the right decision instead. This is often salient because it is annoying, but it is not actually a major cost to our productivity, and doesn’t happen that often to any given researcher.

Besides being annoying, another cost of bureaucracy is that it adds significant serial time / delays, but that is not nearly as bad as it would be if we took a significant hit to productivity, as we can do other projects in parallel.

Q. My worry is that the engineering infrastructure is bad.

This seems wrong to me. I think the engineering infrastructure is very good, if compared to realistic alternatives.

It’s true that, compared to my PhD, the iteration cycles at GDM are longer and the libraries used are more often broken. By far the biggest reason is that in my PhD I didn’t do research that involved massive amounts of compute. For low-compute research on tiny models that fit on a single GPU, yes, it would be faster to do the work using external infrastructure. To steal a common Google phrase, we don’t know how to count that low. Another way of saying this is that Google makes everything medium hard – both things that are normally easy, but also things that are normally impossible.

In cases where we are doing this kind of research, we do aim to use external infrastructure, at least for the early validation phase of a project to gain iteration speed benefits. But we also take this as another reason to focus on high-compute research – our comparative advantage at it is higher than you might guess at first.

I expect the “everything is always at least medium-hard” effect also applies at least somewhat to other labs’ infra. When you are parallelizing across multiple chips, the infra necessarily becomes more complicated and harder to use. When you are working with giant amounts of compute that form significant fractions of expenditure, choices will be made that sacrifice researcher time to achieve more efficient compute usage.

Since GDM reuses Google’s production tooling, there are some aspects that really don’t make sense for research. But GDM is investing in research tooling (and we can feel these gains). One particular advantage is that Google has teams for the entire stack all the way down to the hardware (TPUs), so for basically any difficulty you encounter there will be a team that can help. ASAT also has a small engineering team that supports infra for ASAT researchers in particular. 

(Incidentally, this is one of the subteams we’re hiring for! There’s a lot of room for ambitious creative problem solving to speed up alignment research building on one of the most sophisticated and large scale eng stacks in the world. Apply to the Research Engineer role.)

Also, I’ll again note that our productivity relative to other labs looks pretty good, so I feel like it would be quite surprising if GDM infra was a huge negative hit to productivity.

Q. My worry is that GDM safety doesn’t have enough access to compute.

None of our current projects are bottlenecked by compute, and I don’t expect that to change in the foreseeable future. It’s not completely unimportant – as is almost always true, more compute would help. However, we are much much more people-constrained than compute-constrained.

Q. I have a question not covered elsewhere.

Leave a comment on this post! Please don’t email us individually; we get too many of these and don’t have the capacity to reply to each one.

Apply now!

We will keep the application form open until at least 11:59pm AoE on Thursday, February 27. Please do apply even if your desired start date is quite far in the future, as we probably will not run another public hiring round this year. Most roles can be based in San Francisco, Mountain View, London, or maybe New York, with a hybrid work-from-office / work-from-home model.

While we do expect these roles to be competitive, we have found that people often overestimate what we are looking for. In particular:

Go forth and apply!



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AGI安全 Google DeepMind AI对齐 前沿安全
相关文章