Published on July 2, 2024 6:04 PM GMT

TL;DR

In this post, I describe my methodology for building new material for ARENA. I'll mostly be referring to the exercises on IOI, Superposition and Function Vectors as case studies. I expect this to be useful for people who are interested in designing material for ARENA or ARENA-like courses, as well as people who are interested in pedagogy or ML paper replications.

The process has 3 steps:

Start with something concreteFirst pass: replicate, and understandSecond pass: exercise-ify

Summary

I'm mostly basing this on the following 3 sets of exercises:

Indirect Object Identification

Superposition & SAEs

Function Vectors

The steps I go through are listed below. I'm indexing from zero because I'm a software engineer so of course I am. The steps assume you already have an idea of what exercises you want to create; in Appendix (1) you can read some thoughts on what makes for a good exercise set.

1. Start with something concrete

When creating material, you don't want to be starting from scratch. It's useful to have source code available to browse - bonus points if that takes the form of a Colab or something which is self-contained and has easily visible output.

IOI

Superposition

Function Vectors

2. First-pass: replicate, and understand

The first thing I'd done in each of these cases was go through the material I started with, and make sure I understood what was going on. Paper replication is a deep enough topic for its own series of blog posts (many already exist), although I'll emphasise that I'm not usually talking about full paper replication here, because ideally you'll be starting from something a it further along, be that a Colab, a different tutorial, or something else. And even when you are just working directly from a paper, you shouldn't make the replication any harder for yourself than you need to. If there's code you can take from somewhere else, then do.

My replication usually takes the form of working through a notebook in VSCode. I'll either start from scratch, or from a downloaded Colab if I'm using one as a reference. This notebook will eventually become the exercises. My replication will include a lot of markdown cells explaining what's going on, between the code cells. I usually frame these as explanations to myself, in other words if I don't understand something then I'll figure it out and write it as an explanation to myself. Mostly it's fine if these are written in shorthand; they'll go through a lot of polishing in subsequent steps. For example, here's a cell from an early version of the function vectors exercises, compared to what it ended up turning into:

*Markdown cell just before the* `display_model_completions_on_antonyms` *function, first-pass draft version*

*Markdown cell just before the* `display_model_completions_on_antonyms` *function, final version*

When it comes to actually writing code, I usually like everything to be packaged in easy-to-run functions. Ideally each cell of code that I run should only have ~1-4 lines of actual code outside of functions (although this isn't a strict rule). I try to keep my functions excessively annotated - this includes type indications, docstrings, and a large number of annotations along with plenty of space between lines. This will be helpful for the final exercises because students will need to understand what the code does when they look at the answers, but it's also helpful in the exercise-writing process because it helps me take a step back and distill the high-level things that are going on in each chunk of code. This helps me pull out modular chunks of code to turn into exercises, to make sure that students aren't being asked to do too many things at once (more discussion of this in step 2). Here's an example of the kind of documentation I usually have:

Example of docstrings in the solution functions I write

While I'm doing this replication, I'm usually thinking about how to construct exercises at the same time. It helps that the position students will be in while going through the exercises isn't totally different to the position I'm in while writing the functions in the first place. I'll save the discussion of exercise-ification for the next section, however do bear in mind that I'm doing a lot of the exercise structuring as I go rather than all at once after the replication is complete.

IOI

Sparse Autoencoders

Colab notebook

this visualizer

Function Vectors

One last note - your mileage may vary on this, because it's more of me sharing a productivity tip which helped me - with all 3 of these case studies, this first-pass replication was done (at least in an 80/20 sense) over one very intense weekend where I focused on nothing else other than the replication. I find that framing these as exciting hackathon-type events is a pretty effective motivational tool (even though having them be an actual hackathon with multiple people would probably amplify this benefit).

3. Second-pass: exercise-ify

Once I've replicated the core results, I'll go back through the notebook and convert it into exercises. As I alluded to, some of this will already have been done in the process of the previous step, for example in notes to myself or in the docstrings I've given to functions. Here's an example, taken from an early draft of the function vector exercises:

With that said, in this section I'll still write as if it's a fully self-contained step.

When I go through my notebook in this stage, I'm trying to put myself in the mind of a student who is going through these exercises. As I've mentioned, it's helpful that the perspective of a student going through the exercises isn't totally different to the perspective I had while doing the initial replication. So often I'll be able to take a question about the exercises, and answer it by first translating it into a question about my own experience doing the replication. Some examples:

Question about exercises	Question to myself
What are the most important takeaways I want students to have from each section, both in terms of what theory I want them to know and what kinds of code they should be able to write?	What theory did I need to know to perform this section of the replication, and what coding techniques or tools did I need to use?
How should the exercises be split up, and what order should they be put in?	What were the key ideas I needed to understand to write each bit of code, and how can I create exercises which test just one of these ideas at once?
What diagrams, analogies or other forms of explanation would be helpful to include for students?	Were there any diagrams I drew or had in mind while I was doing the replication?

Here are some concrete examples of what this looked like, for each of the 3 exercise sets.

IOI

Superposition & SAEs

Function Vectors

I'll conclude this section with a bit of an unstructured brain dump. There's probably a lot more that I'm forgetting and I might end up editing this post, but hopefully this covers a good amount!

Each section should include learning objectives

Make the sections results-focused, especially in how they end

Use hints frequently!

Call to action for ARENA exercises

The development of more ARENA chapters is underway! We'd love for you to contribute to the design of ARENA curriculums and suggest content you'd want to see in ARENA using this Content Suggestion Form. If you want to be actively involved as a contributor, you can reach out via this Collaborator Interest Form or email Chloe Li at chloeli561@gmail.com.

Appendix

A1 - what makes for good exercise sets?

Should be a currently active area of research - at least, if we're talking about things like the the interpretability section rather than e.g. some sections of RL or the first chapter which are purely meant to establish theory and lay groundwork. For example, although the ideas behind causal tracing have been influential, it's not currently a particularly active area of research, which is one of the reasons I chose to not make exercises on it.^[1]

Combination of theory and coding takeaways.

Doesn't require excessive compute

A2 - what tools do I use?

Excalidraw

<img>

Acknowledgements

Note - although I've written or synthesized a lot of the ARENA material, I don't want to give the impression that I created all of it, since so much of it existed before I started adding to it. I've focused on examples where I wrote most of the core exercises, but I'd also like to thank the following people who have also made invaluable contributions to the ARENA material, either directly or indirectly:

Slack group

^{^}
There were other reasons I chose not to, e.g. it didn't seem very satisfying for students to implement because it's high on rigour and careful execution of an exact algorithm, and doesn't really contain that many unique ideas. Also, you'd have to be doing causal tracing on some model, the obvious choice would be the bracket classifier model because Redwood has already done work on it, but their published work on it is very long and contingent on specific features of the bracket classifier model rather than generalizable ideas.

Discuss

TL;DR

Summary

1. Start with something concrete

2. First-pass: replicate, and understand

3. Second-pass: exercise-ify

Call to action for ARENA exercises

Appendix

A1 - what makes for good exercise sets?

A2 - what tools do I use?

Acknowledgements

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签