少点错误 前天 07:39
TT Self Study Journal # 1
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文作者分享了他从技术领域转型到计算机科学的经历,并阐述了其在人工智能对齐(AIA)和机械可解释性(MI)领域的研究计划。通过自我学习、与同行交流以及寻找资金支持,作者旨在提升相关技能,为未来的研究工作奠定基础。文章详细介绍了作者的五大研究方向,包括撰写原创文章、调研AIA相关议题、学习数学、进行小型项目实践以及持续推进NDSP项目。此外,作者还分享了为期两周的冲刺计划。

💡 **研究背景与目标:** 作者辞去技术工作,攻读计算机科学学位,致力于人工智能对齐(AIA)和机械可解释性(MI)研究,计划通过自我学习、同行交流和项目实践来提升技能,为未来研究奠定基础。

✍️ **原创文章写作:** 作者计划撰写关于“结果影响系统(OISs)”等原创性想法的文章,旨在促进新话题的讨论,探索现有术语,并反思其想法的优缺点。OISs被认为是AIA研究的关键对象,旨在建立一套独立于科幻或其他历史背景的术语。

📚 **AIA议题调研:** 作者将深入研究AIA相关的关键议题,包括VK LTA、AIXI等,以及MI相关的工具和资源,如维度降低、聚类等,并分享他对这些议题的理解和观点,以拓展对这些话题的关注并检验自己的理解。

➕ **数学学习与实践:** 作者计划学习逻辑、范畴论、计算力学、抽象代数、线性代数、概率论与统计学和拓扑学等数学知识,以夯实理论基础,为MI研究做好准备。

💻 **LLM项目实践:** 作者将通过实践项目,熟悉Transformer和大型语言模型,并结合NDSP项目,探索高维数据分布的可视化与理解。

Published on June 18, 2025 11:36 PM GMT

So, rough elevator pitch, what is this?

I quit my job as a technologist to get a CS degree because I want to work on AI Alignment (AIA) and Mechanistic Interpretability (MI). This summer I am taking my final class in my program, so I want to use a Self Study Journal (SSJ) to improve my AIA relevant skills. I hope to get peer and mentor engagement to help me become a valuable researcher, and for networking to find funding opportunities or paid fellowships. My convocation is in November. My goal is to have found a role by that time.

I want feedback for the value of other peoples insight, and also to help keep motivated with extra accountability, so please lower your inhibition to commenting here. If you would normally think "I don't have anything valuable to contribute" or "it would take too long to write up my thoughts" instead, please leave a comment saying "Good Luck". Thanks : )

I am planning a rough, overarching outline and then making more concrete plans for sprints of work each of which will last one or two weeks. After each sprint I will publish the results of the sprint and the plans for the next sprint.

My overarching outline is divided into 5 categories:

 

SSJ--1. Articles to Write

I have a few original ideas that I’m not aware of other people working on. I’d like to write up the ideas to help me practice the development and communication of original ideas, as well as to explore whether any of these ideas have merit that I can communicate to others. A good outcome would be any of:

The following is a bullet point list of the articles I’m currently interested in writing. I don’t think they will be fully legible here, but if you are curious, please leave a comment asking about them.

 

SSJ--2. Survey of AIA ideas

I have been collecting topics I want to get a better understanding of for a long time, but now that my school curriculum is lighter, I will have time to actually dive into these topics. There’s too much here to write up the “what” and “why” of each item, but as I am working through them I will try to provide a summary of my understandings and opinions which I hope will be valuable both for expanding focus on the topics and for checking my own understanding.

 

SSJ--3. Math

I enjoy math, so I know I’m going beyond what is necessary for MI, but I also think having a rigorous definition of what you are talking about is very valuable in many contexts, so for those reasons, I want to learn some new math topics and to review and practice some old ones. The topics I’m interested in are:

I think I may start out by going through “Topoi, The Categorial Analysis of Logic” by Robert Goldblatt and “Linear Algebra Problem Book” by Paul R. Halmos.

The category theory book is because of my interest in logic and proof, and because I find the idea that category theory can help one understand the connections between various branches of math very satisfying. The linear algebra is because I want to have good intuitions about , where Neural Net parameters and activations live.

 

SSJ--4. LLM Projects

In the pursuit of becoming an AIA and MI researcher it is important to actually research some AI models. I have worked with convolutional VAE and RL models, but have never worked with transformers. I need to get familiar with them and I also want to get some experience using cloud resources to work with larger models.

I think I’ll start out doing some mucking around which I may or may not write up in much detail before trying to choose some minor MI experiments to try. I will probably also want to combine these efforts with NDSP as I make progress on making a more general tool.

 

SSJ--5. NDSP

I’m very inspired by Mingwei Li’s work, especially Toward Comparing DNNs with UMAP Tour. I would like to build tools for working with and understanding data distributions in high dimensional spaces. I have two major goals with this project.

(1) Develop some easy to use tools. This could look like a library like matplotlib that can be used from within jupyter notebooks, or it may look more like a web based data analysis tool. Ideally it would have both.

(2) Make high dimensional structures more intuitively understandable. The first aspect of this is developing a visual language for displaying these structures and the second aspect is making tutorials to help people generalize from simple objects such as hyper-cubes, simplexes, and hyper-spheres to more complicated scenes that may appear in actual high dimensional data distributions. I might also be interested in writing some simple games like 4d pong, n-d maze, or n-d minesweeper. I think games are a great way for people to build intuitions.

I think the first tasks here are to write up some documentation of my ideas and to explore tesorflowjs as a library to use in development.

 

Goals for my 1st Sprint

 

Wish me luck : )



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能对齐 机械可解释性 自我学习 研究计划
相关文章