少点错误 02月22日
Workshop: Interpretability in LLMs Using Geometric and Statistical Methods
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

此研讨会探讨利用几何和统计方法理解大语言模型内部运作,旨在解决AI对齐、安全和效率等关键挑战,涵盖理论与实践应用,促进多领域交流。研讨会为期两天,第一天关注LLM内部表示的几何和统计特性,第二天转向机械可解释性及其在AI安全中的应用。

🎯研讨会探索LLM内部运作的理解,利用几何和统计概念

🌐第一天探讨LLM内部表示的几何和统计特性

🛡️第二天关注机械可解释性及其在AI安全中的应用

💬规划有每日3 - 4场讲座,含讨论与潜在合作时间

Published on February 22, 2025 9:39 AM GMT

Date: Around the last week of May 2025
Location: Science Park, University of Amsterdam (tentative)
Organizers: Jan Pieter van der Schaar (University of Amsterdam), Nabil Iqbal (Durham University) and Karthik Viswanathan (University of Amsterdam)

We are excited to announce a two-day workshop on "Interpretability in LLMs using Geometrical and Statistical Methods" during the last week of May (tentative), with around 20 participants from Amsterdam, AREA Science Park (Trieste), and our invited speakers.

Image credit: Mechanistic Interpretability for AI Safety -- A Review

This workshop explores recent developments in understanding the inner workings of Large Language Models (LLMs) by leveraging concepts from geometry and statistics. The workshop aims to provide an accessible introduction to these approaches, focusing on their potential to address key challenges in AI alignment, safety, and efficiency, while providing an overview of the current research problems in LLM interpretability. By bridging theoretical insights with practical applications, this workshop seeks to foster an exchange of ideas and motivate research at the intersection of computational geometry, statistical mechanics, and AI interpretability.

Overview

The workshop spans two days where Day 1 focuses on the geometric and statistical properties of internal representations in LLMs. The talks on this day are expected to have a physics-oriented perspective. On the second day, we aim to broaden the scope, covering mechanistic interpretability and its applications to AI safety, and exploring how the ideas from Day 1 can contribute to current research challenges in AI safety.

Day 1: Geometric and Statistical Methods for Interpretability

On the first day, we will explore how large language models process and represent information through their internal representations. The discussions will focus on the geometry of embeddings - how they evolve across model layers and the insights they provide. The talks on Day 1 are expected to align with the themes discussed in this blogpost and paper.

Day 2: Mechanistic Interpretability and Applications to AI Safety

On the second day, the focus will shift toward the mechanistic aspects of interpretability, examining how specific circuits in a model’s architecture can be identified and analyzed. The discussions will also explore how these insights can be applied to AI safety research. The talks on Day 2 are expected to align with the themes discussed in this blogpost and paper.

Format

The workshop is still in its early planning stages, so the format may evolve. Currently, the plan is to have 3-4 talks per day, with dedicated time for discussions and potential collaborations. The workshop is currently intended to be fully in-person, but this may be adjusted based on the level of interest from the online community. The speakers and the schedule are yet to be decided.

Questions?

Reach out to me at k.viswanathan@uva.nl or comment below. We look forward to seeing you. In the meantime, here’s a fun comic to keep you occupied!

Image Credit: SMBC comics


Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM 几何统计方法 AI安全 机械可解释性
相关文章