Published on July 23, 2025 3:46 AM GMT

[Epistemic Status: This is an artifact of my self study. I am using it to remember links and help manage my focus. As such, I don't expect anyone to fully read it. If you have particular interest or expertise, skip to the relevant sections, and please leave a comment, even just to say "good work/good luck". I'm hoping for a feeling of accountability and would like input from peers and mentors. This may also help to serve as a guide for others who wish to study in a similar way to me. ]

List of acronyms: Mechanistic Interpretability (MI), AI Alignment (AIA), Outcome Influencing System (OIS), n-Dimensional Scatter Plot (NDSP), Vannessa Kosoy's Learning Theoretic Agenda (VK LTA), Machine Learning (ML), Large Language Model (LLM),

Review of 2nd Sprint

My goals for this sprint were:

my OIS article

Do some work on the "Prompting for Interdisciplinary Attention" section, with a focus gathering definitions and conceptions for the "system" and "substrate" subsections of the definition section.Finish writing the first draft of the definition section. Skip "system" and "substrate" for now.

VK LTA

Email some professors at UVic to see if I can have some conversations about my interests and other math topics that may be valuable.Keep studying Topoi. Next sprint switch to Linear Algebra or Computational Mechanics

Transformers From Scratch

The Interpretability Toolkit

Callum McDougall's guide for it

transformer-utils library

Learning Interpretability Tool

So how did I do?

Daily Worklog

Tu, July 8	Spent about 4 or 5 hours writing SSJ #2 and then started the document for SSJ #3. About 2 hours of that time was spent writing the section on Neel's MI guide transcribing from my handwritten notes. The other 2 hours was split between everything else.
Wd, July 9	No progress. Woke early to go jogging, but didn't get enough sleep so ended up tired and distracted and eventually napped instead of working on this.
Th, July 10	SSJ--2. Spent about an hour reading VK LTA while on the bus.
Fr, July 11	SSJ--2. Spent about 2 hours reading VK LTA.
Sa, July 12	No progress. Went for a hike :-)
Su, July 13	No progress.
Mo, July 14	No progress.
Tu, July 15	No progress.
Wd, July 16	No progress.
Th, July 17	No progress.
Fr, July 18	SSJ--1. About 3 hours researching and thinking about definition of a "system" in the context of OIS. I think I have a grasp on the idea I want to describe now, but just need to figure out how to write it down.
Sa, July 19	No progress.
Su, July 20	No progress.
Mo, July 21	SSJ--1. Worked on definition of "outcome", "influence", and "system" while on bus ride home from lecture.
Tu, July 22	SSJ--3. Spent 3 or 4 hours starting to draft an explanation of my research interests to reference while asking math profs at my university for help honing my math study plan.

Sprint Summary

Well, I'm glad I am now including a daily worklog. It is embarrassing that I failed to get any work done so many days, and I do not wish to repeat this during the next sprint, but as the Litany of Gendlin says, "What is true is already so. Owning up to it doesn't make it worse." and another good one, the Litany of Tarski, "If I haven't been managing my time well, I desire to believe that I haven't been managing my time well." Or, a personal saying of my own, "The first step to influencing a variable is being able to read it's current value".

How did I do with each of my goals?

SSJ--1 -- work on my OIS article

I did get some work done on this. I referenced definitions in other fields, but ended up using them to inform my thinking on the OIS definition. I think it makes more sense to get that fairly fleshed out before actually writing about other fields since the goal is to describe a mapping from the terminology of each field into OIS terminology. So it's still useful to study other fields, but not to start writing sections on them yet.

Still, I think it would be good to focus on something else for the next sprint. The OIS document is going to take me a good amount of time to complete.

I think next sprint I will switch to writing a literature review of AIA glossaries and terminology. This will be good in itself, and will help me verify my intuition that current AIA terminology is a mess and that we need a new paradigm such as OIS. Alternatively, if I disprove that intuition, I will save myself a lot of wasted effort!

SSJ--2 -- Read VK LTA and write a small summary with my thoughts.

I spent a good amount of time reading this, but not in a context where I was taking notes on it as I read, which I think is a mistake. For future reading I'm going to prioritize only reading when I can be active about it, not treating it like something I can passively do on my phone.

The thoughts I do have on VK's LTA are:

Sutton's Bitter Lesson

Also, a career advisor in an EA thread recommended I read Shallow Review of Technical AI Safety 2024, so I'm setting that as next sprint's reading. I will continue VK LTA some other time.

SSJ--3 -- Math

Didn't spend any time studying math, but I did start writing an email to send to math professors and immediately ended up yak shaving, writing a description of my current research directions and what math I am aware of relating to them. Oh well, that's probably a good thing to do anyway, so I've added it as a SSJ-1, writing task, for the next sprint.

SSJ--4 -- Go through Transformers From Scratch.

Did not start this 😥 Adding it unchanged to the next sprint.

SSJ--5 -- Literature review on MI Tooling and Etc...

Did not start this 😥 Adding it unchanged to the next sprint.

Goals for 3rd Sprint

In addition to my 5 focuses, I'm adding a 6th! I realize a lot of the work I'm wanting to do is getting feedback from people on things and networking, so I'm making that more explicit, giving it it's own category going forward.

Additionally, I want to put a focus on making things that are "feedback friendly". What do I mean by this?

They need to know I am looking for it.It must be easy to find the things I'm looking for feedback on. IE, they are not mentioned briefly in the middle of long, otherwise irrelevant, journal articles. IE, there is an easy to navigate "map" of what work I've done that I'm looking for feedback on.The things I am looking for feedback on are easy to understand and engage with. Ideally mentors can quickly get a sense of what I'm doing and whether the direction I'm going seems good or needs adjusting.

I want to keep some focus on idea of "feedback ready work" going forward. Critiquing other agendas, pointing out things I think are flaws and how my work fit's int the context of those flaws seems like a valuable strategy. I shouldn't just be just be reading agenda's I agree with, but also one's I disagree with.

The Goals:

AIA Terminology Lit ReviewMath in my AI Alignment Goals

Shallow Review of Technical AI Safety 2024

Do some Linear Algebra reading and practice.