少点错误 07月02日 05:15
Aether July 2025 Update
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Aether是一个独立的LLM智能体安全研究小组,成立于去年8月。本文介绍了Aether的最新进展,包括研究方向、团队构成以及未来计划。Aether专注于通过可监控性提升LLM智能体的安全性,特别是链式思考(CoT)的可监控性。他们认为,关注智能体在推理过程中的可解释性,对于高风险场景下的安全至关重要。Aether还探讨了改进评估指标、训练模型以生成更可靠的推理轨迹等方向。此外,文章还介绍了Aether的团队成员和资金情况,并鼓励感兴趣的人士参与。

💡 Aether的主要目标是进行技术研究,以深入了解LLM智能体在AI安全方面带来的风险与机遇。他们认为,LLM智能体通过自然语言指令接收目标、进行系统2推理以及监控推理过程,对AI对齐具有重要意义。

🔍 Aether特别关注链式思考(CoT)的可监控性,并认为其是提升LLM智能体安全性的关键。他们认为,AI安全领域应更多关注LLM推理轨迹在下游应用中的实用性,尤其是在高风险场景下的可解释性。

📈 Aether强调了推理时计算范式的重要性,认为这种范式使得模型的思考过程外化,从而可以通过监控CoT来获得更多信息。他们还致力于降低外化CoT的安全成本,以保持推理链的清晰可读性。

🤝 Aether鼓励对LLM智能体安全感兴趣的人士参与。他们正在寻找具有RL训练经验的合作者,并欢迎通过Discord进行讨论和项目合作。Aether的研究方向还包括更好地理解LLM中的目标和信念。

Published on July 1, 2025 9:08 PM GMT

Aether is an independent LLM agent safety research group that was announced last August, and a lot has changed in the past 10 months. Now that we’re funded and have officially kicked off, we’d like to share some information about how to get involved, our research, and our team!

Get Involved!

    Submit a short expression of interest here if you would like to contribute to Aether as a researcher, intern, external collaborator, advisor, operations person, or in any other role.
      We are especially looking for collaborators with experience running RL training on open-weight LLMs, since several projects we are excited about rely on this. If you have this experience and submit an EoI, we may reach out to you about a paid role.
    Join our discord with this invite link! We aim to cultivate valuable LLM agent safety research discussions here, and we’ll likely invite people to join for occasional sprints, hackathons, and project collaborations.Get in touch with Rohan at rs4126@columbia.edu with any questions.

Research

Aether's goal is to conduct technical research that yields valuable insights into the risks and opportunities that LLM agents present for AI safety. We believe that LLM agents have substantial implications for AI alignment priorities by enabling a natural language alignment paradigm—one where agents can receive goals via natural language instructions, engage in explicit system 2 reasoning about safety specifications, and have their reasoning processes monitored by other LLMs.

Within this paradigm, we believe chain-of-thought (CoT) monitorability is a key problem to focus on. There are three reasons why we think that this is a high priority for enhancing LLM agent safety:

    Shifting the field’s focus to monitorability in agentic settings: We believe that the field of AI safety should focus less on strict CoT faithfulness criteria, such as full causal faithfulness, and more on the usefulness of LLMs’ reasoning traces for downstream applications where interpretability matters most, such as monitoring in high-stakes situations.The inference-time compute paradigm: The recent shift to the inference-time compute paradigm has resulted in models that externalize more of their thinking, meaning that there is more to be gained from monitoring the CoT compared to only monitoring actions than before.Reducing the safety tax of externalized CoT: In the current paradigm, simply reading a model’s reasoning chain goes a long way in understanding its cognition. It’s highly important to preserve this state of affairs. Recently, Meta published a paper introducing COCONUT, a proposal to train models to perform most of their reasoning in a latent manner. Similarly, DeepSeek has claimed to be looking for alternatives to the transformer architecture, and people outside of labs have published novel recurrent architectures. By finding ways to keep the safety tax of preserving legible reasoning chains low, we hope to convince labs that it’s worth sticking with the current paradigm.

We are not unique in focusing on monitorability: all major AI companies are thinking about this and external groups have also published exciting work. However, there are several important directions for improving monitorability and not all of them will be covered by default. Two relatively neglected directions that we are particularly excited about are improving metrics and benchmarks, and training models to have more faithful and legible reasoning traces.

For more details on our thinking in this direction, see:

Other directions that we’ve been thinking about revolve around getting a better understanding of goals and beliefs in LLMs, e.g., by investigating the limitations and generalization properties of modifying LLM beliefs with synthetic document finetuning and developing model organisms of LLM misalignment that arises through reflective goal-formation. We are open to feedback that might convince us to focus on these directions instead of monitorability.

Team

Our core team is currently working full-time in-person in London.

We are advised by Seth Herd (Astera Institute), Marius Hobbhahn (Apollo Research), Erik Jenner (Google DeepMind), and Francis Rhys Ward (LawZero).

We have about $200,000 in funding to cover salaries and expenses through Dec 1, 2025. Our funder wishes to remain anonymous for now.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Aether LLM 智能体安全 可监控性 链式思考
相关文章