Published on July 1, 2025 9:08 PM GMT
Aether is an independent LLM agent safety research group that was announced last August, and a lot has changed in the past 10 months. Now that we’re funded and have officially kicked off, we’d like to share some information about how to get involved, our research, and our team!
Get Involved!
- Submit a short expression of interest here if you would like to contribute to Aether as a researcher, intern, external collaborator, advisor, operations person, or in any other role.
- We are especially looking for collaborators with experience running RL training on open-weight LLMs, since several projects we are excited about rely on this. If you have this experience and submit an EoI, we may reach out to you about a paid role.
Research
Aether's goal is to conduct technical research that yields valuable insights into the risks and opportunities that LLM agents present for AI safety. We believe that LLM agents have substantial implications for AI alignment priorities by enabling a natural language alignment paradigm—one where agents can receive goals via natural language instructions, engage in explicit system 2 reasoning about safety specifications, and have their reasoning processes monitored by other LLMs.
Within this paradigm, we believe chain-of-thought (CoT) monitorability is a key problem to focus on. There are three reasons why we think that this is a high priority for enhancing LLM agent safety:
- Shifting the field’s focus to monitorability in agentic settings: We believe that the field of AI safety should focus less on strict CoT faithfulness criteria, such as full causal faithfulness, and more on the usefulness of LLMs’ reasoning traces for downstream applications where interpretability matters most, such as monitoring in high-stakes situations.The inference-time compute paradigm: The recent shift to the inference-time compute paradigm has resulted in models that externalize more of their thinking, meaning that there is more to be gained from monitoring the CoT compared to only monitoring actions than before.Reducing the safety tax of externalized CoT: In the current paradigm, simply reading a model’s reasoning chain goes a long way in understanding its cognition. It’s highly important to preserve this state of affairs. Recently, Meta published a paper introducing COCONUT, a proposal to train models to perform most of their reasoning in a latent manner. Similarly, DeepSeek has claimed to be looking for alternatives to the transformer architecture, and people outside of labs have published novel recurrent architectures. By finding ways to keep the safety tax of preserving legible reasoning chains low, we hope to convince labs that it’s worth sticking with the current paradigm.
We are not unique in focusing on monitorability: all major AI companies are thinking about this and external groups have also published exciting work. However, there are several important directions for improving monitorability and not all of them will be covered by default. Two relatively neglected directions that we are particularly excited about are improving metrics and benchmarks, and training models to have more faithful and legible reasoning traces.
For more details on our thinking in this direction, see:
- Our current research agenda: Enhancing the Monitorability of LLM Agents (Aether, 2025)~80 Interesting Questions about Foundation Model Agent Safety (Subramani and Pimpale, 2024)On the Implications of Recent Results on Latent Reasoning in LLMs (Arike, 2025)
Other directions that we’ve been thinking about revolve around getting a better understanding of goals and beliefs in LLMs, e.g., by investigating the limitations and generalization properties of modifying LLM beliefs with synthetic document finetuning and developing model organisms of LLM misalignment that arises through reflective goal-formation. We are open to feedback that might convince us to focus on these directions instead of monitorability.
Team
Our core team is currently working full-time in-person in London.
- Rohan Subramani studied CS and Math at Columbia, where he helped run an Effective Altruism group and an AI alignment group. He has done AI safety research in LASR, CHAI, MATS, and independent groups. He is starting a PhD at the University of Toronto this fall with Prof. Zhijing Jin and continuing Aether work within the PhD.Rauno Arike studied CS and Physics at TU Delft, where he co-founded an AI alignment university group. He has done MATS 6 with Marius Hobbhahn, contracted with UK AISI, and worked as a software engineer.Shubhorup Biswas is a CS grad and a (former) software engineer, with experience across product and infra in startups and big tech. He did MATS 7.0 with Buck Shlegeris working on AI Control for sandbagging and other low stakes failures.
We are advised by Seth Herd (Astera Institute), Marius Hobbhahn (Apollo Research), Erik Jenner (Google DeepMind), and Francis Rhys Ward (LawZero).
We have about $200,000 in funding to cover salaries and expenses through Dec 1, 2025. Our funder wishes to remain anonymous for now.
Discuss