Published on July 3, 2025 2:30 PM GMT
In the fall I am planning to teach an AI safety graduate course at Harvard. The format is likely to be roughly similar to my "foundations of deep learning" course.
I am still not sure of the content, and would be happy to get suggestions.
Some (somewhat conflicting desiderata):
- I would like to cover the various ways AI could go wrong: malfunction, misuse, societal upheaval, arms race, surveillance, bias, misalignment, loss of control,... (and anything else I'm not thinking of right now). I talke about some of these issues here and here.I would like to see what we can learn from other fields, including software security, aviation and automative safety, drug safety, nuclear arms control, etc.. (and happy to get other suggestions)Talk about policy as well, various frameworks inside companies, regulations etc..Talk about predictions for the future, methodologies for how to come up with them.All of the above said, I get antsy if I don't get my dosage of math and code- I intend 80% of the course to be technical and cover research papers and results. It should also involve some hands on projects.Some technical components should include: evaluations, technical mitigations, attacks, white and black box interpretability methods, model organisms.
Whenever I teach a course I always like to learn something from it, so I hope to cover state of art research results, especially ones that require some work to dig into and I wouldn't get to do so without this excuse.
Anyway, since I haven't yet planned this course, I thought I would solicit comments on what should be covered in such a course. Links to other courses blogs etc. are also useful. (I do have a quirk that I've never been able to teach from someone else's material, and often ended up writing a textbook, see here, here, and here whenever I teach a course.., so I don't intent to adapt any curricula wholesale)
Thanks in advance!
Discuss