Published on June 3, 2025 5:38 PM GMT
I've recently completed the in-person ARENA program, which is a 5-week bootcamp teaching the basics of safety research engineering (with the 5th week being a capstone project). Sometimes, I talk to people who want to work through the program independently and who ask for advice. Even though I didn't attempt this, I think doing the program in-person gives me some insight into how to get most out of the program when doing it independently, so here are my thoughts and tips:
On working speed
- Day 0.0 (prerequisites) takes the typical person more than one day to work through.Most other days are feasible to mostly finish within a day in the in-person program. In-person, participants spend around 6-7 hours per day on pair-programming. There are a few factors that will likely make you slower when you're on your own:
- If you don't find a working partner, then working deeply for 6-7 hours per day might be infeasible depending on your dispositions. In-person it gets feasible since you alternate with your pair-programming partner, which reduces the overall load on your attention.Often when you struggle in-person, your working partner knows how to move on. If both partners don't know what to do, you can ask a teaching assistant (TA). So you should expect to struggle more often, and for longer, if you're alone.
Should you do all the material?
If you're working independently, the answer is probably no. Probably not all material is equally valuable to everyone. In an in-person program it makes sense to have everyone work through the same material at the same pace since otherwise it becomes difficult to pair people up for pair-programming and for TAs to prepare for questions. But if you're alone, you probably want to skip more material that isn't as interesting to you personally.
How should you approach each day?
- Often there is reading material at the start of a day, e.g. in the form of well-known papers. In my experience, the ARENA material is often self-contained enough that I wouldn't recommend spending much time on actually reading these papers, unless it later turns out that you don't understand something.[1]Each day is largely structured into exercises that isolate a small concept. All exercises have provided solutions. When should you look at solutions?
- For things that are very important to learn (e.g., how to implement attention in transformers), you should probably avoid looking at solutions unless you're actually stuck. If you're stuck, try to figure out what piece of information you're missing, and extract precisely that piece from the solution. Then go back and try to solve the rest of the problem on your own.Some things are important to learn (like how to log with wandb), but there is nothing conceptually interesting about doing it. For such exercises, I think it's justified to look at the solution and simply remember how to do it.The importance-ratings in the exercises are useful. If exercises are rated as less important, it's often fine to skip them or to look at the solutions. In contrast, I think you should mostly ignore the difficulty ratings when deciding whether to do an exercise or not. Some exercises are difficult, but if they're very important, it's still important to try to do them on your own. (Though as mentioned before, if you're stuck, try to peak at the solution for the minimal amount of information that gets you unstuck)
Which days/sections are valuable to do?
This is very subjective, but here I'll give you my assessment of which days or sections in days are valuable to do. Probably other people's opinions will differ. Also, note that the program keeps evolving, so it's possible that my opinions and advice are out-of-date once you read this post.
- Week 0: Fundamentals:
- [0.0] Prerequisites: Will take you more than one day. It's important, but don't spent endless time on it.
- The VSCode section contains lots of things you could try to learn. I think you can mostly move on once you manage to run codes as cells.Numpy, einops, and einsum are all important, but in total, there is much more material linked and provided than you need for starting the program. It's not necessary to know every detail before moving on.
- The optimizers section is interesting, and I found it valuable to see that e.g. the Adam optimizer can be implemented in just a few lines of code. It's not strictly necessary to do it since one can largely take optimizers as blackboxes, so you could also skip this. Weights and biases (wandb): Do it, but simply look at the solutions of the exercises if you don't know how to do it. Distributed training: I recommend skipping this section, it's not very well-explained, and the rest of the material can be done on just one GPU.
- [1.1] Transformers from scratch: Very important. I recommend doing it almost completely, except maybe some of the later sampling methods.[1.2] Intro to MechInterp: Might be worth doing completely, though note that it's more content than most people can do within a day.The rest of the week is optional. Do what you're interested in! I liked the sections on Toy models of superposition and OthelloGPT (though the latter felt somewhat difficult to me and is also quite long. Might be hard to get through without a working partner!)
- [2.1] Intro to RL: Conceptually very important, worth doing completely.[2.2] Q-Learning and DQN: I also recommend doing it completely, but I'm unsure. Even though DQN is mostly not used anymore today, the update of the Q-function is related to the update of the critic in actor-critic methods like PPO. [2.3] PPO: Definitely read through the section on the "Whirlwind tour of PPO". The rest of the material is interesting, too, but I wasn't entirely sure whether I trust the material to be fully correct, and some parts (especially the generalized advantage estimation) didn't seem to be well-explained. Since in [2.4] you will do PPO specifically for LLMs, it might be possible to get much of the value by skipping to the next day.[2.4] RLHF: Good day, I recommend doing it entirely. But to be clear, this day is only about the policy optimization stage of RLHF, i.e., about PPO with a KL penalty. If you are interested in the "human feedback" (HF) part of RLHF, you will only find it in the bonus material in the section "Learn a human preference reward model", which isn't yet very distilled.There is no day [2.5].
- [3.1] Intro to Evals: Do the sections on Intro to API calls and the alignment faking case study. The third section on threat modeling and eval design can be skipped/skimmed.[3.2] Dataset Generation: I found the first section very useful, struggling through it tought me a lot about prompt engineering. Note that these exercises can go different ways depending on what model property you're writing an eval for, and the difficulty of prompt engineering can vary dramatically. This might influence your experience.[3.3] Running Evals with Inspect: Very useful. Definitely do it.[3.4] LLM Agents: In principle interesting, but the exercises seemed somewhat "fuzzy" since there are so many ways to build an agent. It was a bit hard to know exactly how the exercises are supposed to be solved, and so it can be worthwhile to peak at more of the solutions. There is no day [3.5] (but [3.4] has enough material for two days)
What is missing?
There is some material that would be useful to learn but which is missing from the ARENA material. I hope some of this material will be added in the future.
Most importantly and maybe surprisingly, while the program is called "Alignment Research Engineering Accelerator", there is actually almost no content on how to align AI systems (except the policy-optimization stage of RLHF). There is no content on scalable oversight.
Should you work through the material alone?
I don't know, it depends on your ability to motivate yourself to work through a large amount of material on your own, and also on your prior skills. Here is a somewhat cheap test:
- Work through enough of the prerequisites (day [0.0]) to be able to move on.Do day [0.2] on CNNs and ResNets. If you can do this day, you probably can work through the material on your own. To be sure:Do day [1.1] on transformers.
This should be a reasonably cheap test. Notice how long you need to complete each of the days (remembering that [0.0] likely takes more than a day) and use this to assess whether it's worth it for you to work through a larger chunk of the material alone. If you want to do the program in-person instead, express your interest for the next iteration.
- ^
One caveat here is that I was already somewhat familiar with much of the content on a conceptual/theoretical level. It's possible that other people gain more from actually reading the provided material.
- ^
When I say "completely" here or later, I mean everything except the bonus material.
Discuss