Published on July 30, 2025 12:55 AM GMT
TLDR
- I am looking for people who want to be supervised by me to write a mech interp paper. Apply here now! Due Aug 29[1]Application task: Spend ~12 hours (max 20) working on a mechanistic interpretability research problem of your choice, and send me a write-up + executive summary of what you learned. (See advice, details, past examples and recommended problems in the doc)The top ~32 candidates will do a 5 week paid online exploration phase (Sept 29 - Oct 31) ending in a 2 week research sprint in pairs.
- Expect unstructured, self-driven learning
- I have 1.5 hr/week check-ins with each pair, supervising them as they write a paper.The typical scholar publishes at least one co-first author paper at a top ML venue.
- Past scholars include professors, undergrads with no mech interp experience, startup founders, and researchers with several great mech interp papers already
Table of Contents
The key details and FAQ are copied below for convenience, the rest are in the doc
Advice on producing a good application in 20 hours
What does a good application look like?
Key Details
- Application task: Spend ~12 hours (max 20) trying tomake research progress on a mechanistic interpretability problem of your choice
- Submit via this form, due Fri Aug 29th 11:59pm PTPlease submit a write-up and executive summary showing me what progress you madeand what you learnedabout the problem.
- I value communication skill, don’t rush the write up! The time limit has up to two additional hours for the executive summary.See examples of successful past write-ups here
- You can take as much time as you want beforehand for general learning.
- Applications due Aug 29Decisions released Sept 16Exploration phase Sept 29 - Oct 31 (5 week online program for top ~32 candidates)Research phase decisions Nov 6Research phase Jan 5 - March 27 (12 week in-person program for top ~8 candidates)
- In MATS 8.0, 5 of my 8 scholars had minimal prior mech interp experience, but have been doing fantastically - by halfway through the program, some of them had:
- Helped understand emergent misalignment (i.e. why training a model to write buggy code turns it into a Nazi) and been interviewed about it by MIT Tech ReviewExplored new paradigms for interpreting reasoning models
FAQ
Why might you want to apply?
My core goal is to teach you how to do great mechanistic interpretability research.
I run the Google DeepMind mechanistic interpretability team and I have a lot of experience supervising research. In the past 3 years, I have mentored 50 junior researchers and supervised 30+ MATS papers, and 15 top conference papers[3].
The program often helps scholars get into mech interp careers
- Seven now do interpretability research at frontier AGI labs, including Arthur Conmy, who works for me leading the GDM Applied Interpretability Team.Two alumni lead research teams at the UK government's AI Security Institute
Past scholars also do excellent research in the program itself, even those totally new to mech interp! Some highlights:
- Showing open source LLMs can be cheaply jailbroken with linear algebra, by ablating the refusal direction
- This inspired projects at multiple frontier labs, including a Meta paper on fixing it.
Why is this application so much effort?
- I care a lot about being meritocratic. This way lets me find the best applicants, not just those who look good on paper. I do my best to assess your potential, not just what you’ve already done (though it’s still super noisy!)I've also tried to design this application process so that spending time on it is useful whatever the outcome - I don’t want to waste 12+ hours of your time!I think it's a pretty realistic simulation of doing research, especially if you haven’t done interpretability research before. Candidates often learn a lot, and are surprised by how much they can get done.
- I've sometimes heard from unsuccessful applicants that they enjoyed the application so much it convinced them to pursue a research career!If you’re not sure if you’re interested in doing mech interp or not, I’d encourage you to try applying! I think you'll learn a lot from the application about whether it's a good fit.
What am I looking for in an application?
- My ideal application is one that teaches me something new.
- This looks like identifying an interpretability hypothesis, gathering evidence for and against it, and writing up the evidence and analysis clearly.
What happens in the program?
- The top ~32 candidates will do a 5 week online exploration phase Sept 29 - Oct 31
- The final two weeks (full time) are spent doing a research sprint in pairs. Admission to the research phase is largely based on sprint performance.The first three weeks (part time) are the preparation phase. This means preparing for the sprint: self-driven skilling up, doing several day mini research projects with other scholars, going to talks/sessions, reading papers, etc. How you spend your time is up to youMore info here
- Scholars work in pairs to write a mech interp paper, with a 1.5 hr/week check-in from me and some Slack supportAll recent scholars have published this as a co-first author paper at a top ML venue (NeurIPS/ICLR/ICML) - see lists of past work below
What happens if I don’t get through to the research phase?
- While unfortunately most exploration phase candidates don’t make it to the research phase, I’ve designed the exploration phase to be a valuable experience in its own right, and to teach useful research skills.
- The median participant rates it as 1.5x-2.5x the counterfactual use of time.
- 5 exploration-phase only scholars found other MATS 8.0 mentors as a result of participatingI helped 8-10 exploration phase-only scholars write papers based on their sprint projects
Why shouldn’t I apply?
- Obviously, the application takes a while! If it doesn’t sound fun, you probably shouldn’t do it.The exploration phase of the program is fairly competitive, which some people find very stressful
- Generally, participants seem to be nice and cooperative, especially since you want to form teams, but the awareness of your chances can be very stressful for some
How should I choose a problem?
- I'm open to any application that shows strong research skill, but will be more excited about those matching my research interestsMy research interests have changed a fair bit from some of my past work - more details below, but in brief I’m now fairly pessimistic about ambitious interpretability (i.e. complete reverse-engineering), and I’m excited about model biology (studying qualitative high-level properties of models) and applied interpretability (rigorously doing useful things with interp). I’m still interested in basic science, but have a higher bar.
- Applications that surprise me with something new and cool are fantastic!
Can I use LLMs?
Yes. In fact, I strongly recommend it! LLMs are a crucial research tool nowadays, and are especially useful for those getting into a new field.
- More advice on using LLMs well below
You're welcome to use them for coding, writing, etc, whatever you want - I want to gauge how well you’ll do as a researcher, which includes whatever tools you’d actually use.
- It is your responsibility to ensure your code and writing are high quality. Well-written write-ups are welcome. Docs that read like LLM slop will be rejected.
I recommend using Cursor for coding (replacing eg VS Code) and using Gemini 2.5 Pro[4] for browser based tasks
I've compiled a folder of useful text files for mech interp research, containing a bunch of relevant docs & source code of key libraries, tutorials from ARENA and key libraries, key papers and my relevant blog posts.
- By default, just put this 600k token file in Gemini’s context window, which contains the most important documents[5].
How does a research supervisor add value?
- My model is that research requires a mix of skills. The day-to-day coding and execution is crucial. But there's also a set of harder-to-learn conceptual skills, collectively called research taste. These skills take a long time to gain because they have poor feedback loops, but they take very little time to use.My main role is to lend you my research taste and bootstrap your own. This looks like helping with:
- High-level Strategy: Choosing a good problem, knowing when to pivot away from a dead end, or prioritizing which of several promising directions to pursue.Experimental Design: Designing a clean experiment to conclusively test a hypothesis, thinking of alternative explanations for your results, or knowing when evidence is strong enough.
See a bunch more info and guidance in the other tabs of the doc
Other MATS apps open in late August ↩︎
In my 4 most recent cohorts, I’ve had 3 independent researchers, 9 ML PhD students/recent PhD grads, 7 undergrads, 3 ML masters students, 5 former software engineers, 1 physics PhD student, 1 ML postdoc, 1 neuroscience postdoc, 2 quant traders, and 1 former entrepreneur ↩︎
Note that almost all scholars in recent cohorts have published at least one co-first author conference, and many of the 30 papers are too recent to have finished peer review - list here. But my top priority is to help you do great research, publishing is a bonus. ↩︎
I'm not just saying this because I work for Google! It's a frontier model, it's free, it's pretty fast, and it can take a million tokens of context. The best paid models from other providers are also great choices but can’t take as much context. ↩︎
It starts with a table of contents explaining what’s in it. ↩︎
Discuss