少点错误 01月02日
My January alignment theory Nanowrimo
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

作者宣布将在1月份发布一系列技术文章和短篇,主要内容涵盖多个研究方向,包括多面体、模式连接性、形式化方法、语法规则、类比电路、可解释性、量子场论方法、贝叶斯与SGD学习等。这些文章部分是与合作者共同完成,旨在分享研究进展并促进讨论。作者计划每周至少发布三篇文章,并希望通过这种方式提高写作效率,同时接受读者反馈以改进研究。

💡作者计划发布关于多面体、模式连接性和形式化方法等研究方向的观点性文章,这些文章旨在探讨不同研究路径的优劣。

📝 作者将分享关于语法以及更简单的规则如何组合成更大结构的笔记,这与一个关于“类比电路”的项目有关,该项目研究机制如何通过类比来概括复杂的规则,而无需编码结构本身。

🔬 作者将与Lauren Greenspan和Lucas Teixeira合作,发布关于可解释性的研究成果,重点关注假设和实验的思考方式,同时也会探讨量子场论方法在可解释性中的应用。

📊 作者还将探讨贝叶斯学习与SGD学习之间的差异,并从不同角度进行分析,同时扩展“低垂的果实”先验概念,特别关注奇偶性的不可学习性和“训练故事”的新概念。

✍️ 作者希望通过本月的写作冲刺,测试短篇技术文章的写作形式,并期待通过读者反馈来改进研究,并希望通过这些文章促进讨论,更快地发现和纠正错误。

Published on January 2, 2025 12:07 AM GMT

This is a quick announcement/commitment post:

I've been working at the PIBBSS Horizon Scanning team (with Lauren Greenspan and Lucas Teixeira), where we have been working on reviewing some "basic-science-flavored" alignment and interpretability research and doing talent scouting (see this intro doc we wrote so far, which we split off from an unfinished larger review). I have also been working on my own research. Aside from active projects, I've accumulated a bit of a backlog of technical writeups and shortforms in draft or "slack discussion"-level form, with various levels of publishability. 

This January, I'm planning to edit and publish some of these drafts as posts and shortforms on LW/the alignment forum. To keep myself accountable, I'm committing to publish at least 3 posts per week. 

I'm planning to post about (a subset? superset? overlapping set? of) the following themes:

    Opinionated takes on a few research directions (I have drafts on polytopes, mode connectivity, and takes on proof vs. other kinds of "principled formalism without proofs").Notes on grammars and more generally, how simpler rules and formal structures can combine into larger ones. This overlaps with a project I'm working on with collaborators, involving a notion of "analogistic circuits": mechanisms that learn to generalize a complex rule "by analogy", without ever encoding the structure itself.Joint with Lauren Greenspan and Lucas Teixeira: some additional bits of our review, with a focus on interepretability (and ways to think about assumptions and experiments).Joint with Lauren: some distillation and discussion of QFT methods in interpretability.Bayesian vs. SGD learning from various points of view. (Closely related to discussions with Kaarel Hänni, Lucius Bushnaq, and others).Related to the above: Extensions of the "Low-Hanging-Fruit" prior post with Nina Panicksserry, specifically focusing on non-learnability of parity, and a new notion of "training stories" (this is closely related to some other work we've done with Nina, as well as joint work with Louis Jaburi).???

I am generally resistant to making announcements before doing writeups. But in this case, I have thought for a while that these drafts might be useful to get out, but have been blocked by not wanting to post unpolished things. I'll be pointing at this announcement when posting this month for the following reasons:



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

研究方向 可解释性 形式化方法 类比电路 技术写作
相关文章