Published on February 17, 2025 9:11 PM GMT

The AGI Safety & Alignment Team (ASAT) at Google DeepMind (GDM) is hiring! Please apply to the Research Scientist and Research Engineer roles. Strong software engineers with some ML background should also apply (to the Research Engineer role). Our initial batch of hiring will focus more on hiring engineers, but we expect to continue to use the applications we receive for future hiring this year, which we expect will be more evenly split. Please do apply even if e.g. you’re only available in the later half of this year.

What is ASAT?

ASAT is the primary team at GDM focused on technical approaches to severe harms from AI systems, having evolved out of the Alignment and Scalable Alignment teams. We’re organized around two themes: AGI Alignment (think amplified oversight and mechanistic interpretability) and Frontier Safety (think development and implementation of the Frontier Safety Framework). The leadership team is Anca Dragan, Rohin Shah, Allan Dafoe, and Dave Orr, with Shane Legg as executive sponsor.

Why should you join?

I’d say there are three main ways in which work on the GDM safety team is especially impactful:

Frontier Safety Framework

Despite our small size relative to the size of Google, our team is responsible for setting an AGI safety approach that applies at Google’s massive scale (via the FSF). This is a big deal – actions taken by Google will typically have greater policy impact than the same actions taken by smaller frontier labs.

By far our biggest resource bottleneck is people, so new hires should expect to make a significant difference to our impact.

GDM is also a great place to learn and upskill – we’re surrounded by incredibly strong researchers and engineers, both for safety and ML more broadly.

Also, while everyone says this, I really do think our team has a great culture. Team members know the point of the project they’re working on; any team member can raise an objection and they will be listened to. People are incredibly helpful and generous with their time. At least one person who joined us with 10+ years of industry experience finds it the best culture they've been in.

What will we do in the near future?

Half a year ago, we published an overview of our recent research. This should give you a decent sense of the type of work we plan to do in the future as well. The biggest change relative to that post is that we’re planning to work a lot on monitoring, particularly chain-of-thought monitoring, which we think of as a near-term example of AI control.

Here are a few concrete things I hope for the team to accomplish by the end of 2025, to give a sense of what you’d be contributing to:

FSF

externalized reasoning

debate

sandwiching

I doubt we’ll succeed at all of these. Perhaps I’d guess we’ll succeed at 4-5 of them in spirit (i.e. ignoring minor deviations from the letter of what we wrote).

We’ll also do a few other things not on this list. For example, we expect to improve our approach to preparing for the automation of ML R&D, but we don’t yet know what that will look like, so it was hard to write down as concrete an outcome as we did for the other items on the list. And of course there will be new things that we work on that I haven’t yet anticipated.

How do you prioritize across research topics?

Generally, we don’t assign people to work on particular topics. Instead, team members can choose what they work on, as long as they can convince me (Rohin) that their project has a decent theory of change, and they can find enough collaborators that the project will move forward at a reasonable pace. (This is somewhat less true on Frontier Safety, where there is somewhat more assignment of people to particular tasks.)

As a result, there isn’t a clean answer to “how do you prioritize”, since prioritization depends on the expertise, skills and views of individuals on the team, and is effectively based on an implicit aggregation of a variety of views on the team about what work is impactful that is hard to reify.

Nonetheless, I can say a bit about how I personally think about prioritization across high-level projects. As a completely made up number, I’d guess that my views drive roughly 50% of overall prioritization on the team (through a combination of formal authority, convincing team members of my views, and deference).

Roofshots. An important part of my view is that there’s a lot of “simple” or “obvious” work to be done that buys significant safety, where it is primarily important to foresee that the work is needed, and execute well on it. So rather than aiming for research breakthroughs (“moonshots”), I see our job as primarily about executing well at “roofshots”.

(Note that I view work on MONA and debate as a series of roofshots – I’m very much not saying “just do some evals and YOLO it”.)

I expect that if we consistently achieve roofshots, that will in aggregate go beyond what a moonshot would have achieved, in less time than it would take to produce a moonshot. This seems like the default way in which impressive progress happens in most fields (see e.g. Is Science Slowing Down?).

Comparative advantage. My general heuristic is that our research should take advantage of one or both of our two main comparative advantages:

GDM integration:

Lab advantages:

I used to have the view that we should just work on whatever seemed most important and not worry too much about the factors above, since we hire some of the most talented people and can do a better job than most other groups. I still believe the latter part – for example, many have tried to explain why grokking happens, but I think our explanation is the best; similarly many investigated unsupervised knowledge discovery as an empirical AGI safety technique, and I think our paper provided the most decision-relevant evidence on the subject (except possibly the ELK report).

However, I’ve changed my mind on the overall view, because there’s quite a lot of important work to be done in the two buckets above, and other work doesn’t look massively more important, such that we really do want to get the gains from trade available by focusing on comparative advantages.

Now, when someone on ASAT wants to do important work that doesn’t fall in one of the two buckets, I’m more likely to recommend an external collaboration or MATS mentoring. Around 10 team members do substantial external mentoring. Over the last year, they’ve supervised ~50 external researchers, producing ~25 papers.

FAQ

Q. Does GDM take AGI safety seriously?

Rather than having to take our word for it, we think there is significant public evidence.

DeepMind was founded with an AGI safety mission. Its leadership endorsed the importance of AGI safety when DeepMind was founded (see posts), and continues to do so (see CAIS statement, recent podcast, and discussion of the AI Action Summit).

(People sometimes suggest that frontier labs invest in AGI safety as a form of safety washing, with upsides like dissuading regulation or attracting EA talent and funding. This hypothesis fails to retrodict the history of DeepMind. DeepMind was founded in 2010, a time when AGI safety was basically just SIAI + FHI, and “effective altruism” hadn’t been coined yet. The founders were interested in AGI safety even then, when it was clearly bad for your prospects to be visibly associated with AGI safety.)

DeepMind has had an AGI safety team since 2016, and has continually supported the team in growing over time. ML researchers are not cheap, and nor is the compute that they use. I’m pretty unsure whether Open Philanthropy has spent more on technical AGI safety than Google has spent on its technical AGI safety team.

I think the more relevant issues are things like “there are many stakeholders and not all of them take AGI safety seriously” or “there are constant pressures and distractions from more immediately pressing things, and so AGI safety is put on a backburner”. These are obviously true to at least some degree, and the question is more about quantitatively how rough the effects are.

One clear piece of evidence here is that Google (not just GDM) has published and updated the Frontier Safety Framework (FSF), with the first version preceding the Seoul AI Commitments. Google is not a startup – it’s not like we just got a quick approval from Demis and Shane, and voila, now the FSF could be published. We did a lot of stakeholder engagement. If GDM didn’t take AGI safety seriously, then (at least prior to the Seoul AI Commitments) the relevant stakeholders would have ignored us and the FSF would not have seen the light of day.

Q. Isn’t GDM incredibly bureaucratic, stifling all productivity?

While there is non-zero truth to this, I think this has been greatly overstated in the safety community. We published an overview of our work over ~1.5 years – you can judge for yourself how that compares to other labs. My sense is that, compared to the other AI labs, our productivity-per-person looks similar or better. Personally, I like our work more, though since I have a lot of influence over what work we do, of course I would say that.

Don’t get me wrong – there is bureaucracy, and sometimes it tries to block things for silly reasons. If it’s important, we escalate to get the right decision instead. This is often salient because it is annoying, but it is not actually a major cost to our productivity, and doesn’t happen that often to any given researcher.

Besides being annoying, another cost of bureaucracy is that it adds significant serial time / delays, but that is not nearly as bad as it would be if we took a significant hit to productivity, as we can do other projects in parallel.

Q. My worry is that the engineering infrastructure is bad.

This seems wrong to me. I think the engineering infrastructure is very good, if compared to realistic alternatives.

It’s true that, compared to my PhD, the iteration cycles at GDM are longer and the libraries used are more often broken. By far the biggest reason is that in my PhD I didn’t do research that involved massive amounts of compute. For low-compute research on tiny models that fit on a single GPU, yes, it would be faster to do the work using external infrastructure. To steal a common Google phrase, we don’t know how to count that low. Another way of saying this is that Google makes everything medium hard – both things that are normally easy, but also things that are normally impossible.

In cases where we are doing this kind of research, we do aim to use external infrastructure, at least for the early validation phase of a project to gain iteration speed benefits. But we also take this as another reason to focus on high-compute research – our comparative advantage at it is higher than you might guess at first.

I expect the “everything is always at least medium-hard” effect also applies at least somewhat to other labs’ infra. When you are parallelizing across multiple chips, the infra necessarily becomes more complicated and harder to use. When you are working with giant amounts of compute that form significant fractions of expenditure, choices will be made that sacrifice researcher time to achieve more efficient compute usage.

Since GDM reuses Google’s production tooling, there are some aspects that really don’t make sense for research. But GDM is investing in research tooling (and we can feel these gains). One particular advantage is that Google has teams for the entire stack all the way down to the hardware (TPUs), so for basically any difficulty you encounter there will be a team that can help. ASAT also has a small engineering team that supports infra for ASAT researchers in particular.

(Incidentally, this is one of the subteams we’re hiring for! There’s a lot of room for ambitious creative problem solving to speed up alignment research building on one of the most sophisticated and large scale eng stacks in the world. Apply to the Research Engineer role.)

Also, I’ll again note that our productivity relative to other labs looks pretty good, so I feel like it would be quite surprising if GDM infra was a huge negative hit to productivity.

Q. My worry is that GDM safety doesn’t have enough access to compute.

None of our current projects are bottlenecked by compute, and I don’t expect that to change in the foreseeable future. It’s not completely unimportant – as is almost always true, more compute would help. However, we are much much more people-constrained than compute-constrained.

Q. I have a question not covered elsewhere.

Leave a comment on this post! Please don’t email us individually; we get too many of these and don’t have the capacity to reply to each one.

Apply now!

We will keep the application form open until at least 11:59pm AoE on Thursday, February 27. Please do apply even if your desired start date is quite far in the future, as we probably will not run another public hiring round this year. Most roles can be based in San Francisco, Mountain View, London, or maybe New York, with a hybrid work-from-office / work-from-home model.

While we do expect these roles to be competitive, we have found that people often overestimate what we are looking for. In particular:

do not

If we ask you, say, whether an assistive agent would gradient hack if it learned about its own training process, we’re looking to see how you go about thinking about a confusing and ill-specified question (which happens all the time in alignment research). We aren’t expecting you to give us the Correct Answer, and in fact there isn’t a correct answer; the question isn’t specified well enough for that. We aren’t even expecting you to know all the terms; it would be fine to ask what “gradient hacking” is.

my career FAQ

Go forth and apply!

Research Scientist

Research Engineer

Discuss

What is ASAT?

Why should you join?

What will we do in the near future?

How do you prioritize across research topics?

FAQ

Apply now!

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签