MarkTechPost@AI 13小时前
Allen Institute for AI-Ai2 Unveils AutoDS: A Bayesian Surprise-Driven Engine for Open-Ended Scientific Discovery
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Allen Institute for Artificial Intelligence (AI2) 推出了AutoDS,一个突破性的开源自主科学发现原型引擎。与依赖人类设定的目标或查询的传统AI研究助手不同,AutoDS能够自主生成、测试和迭代假设,通过量化和寻找“贝叶斯惊喜”来实现真正的科学发现,甚至超越人类的预设关注点。AutoDS借鉴了人类科学家的好奇心驱动探索模式,能够自行决定提出什么问题、追求哪些假设以及如何基于已有结果进行推进,完全无需预设目标。它通过大型语言模型(LLMs)量化贝叶斯惊喜,并利用蒙特卡洛树搜索(MCTS)来高效探索假设空间,在多个科学领域表现出优于传统方法的发现效率。AutoDS采用模块化多智能体LLM架构,包含假设生成、实验设计、编程执行和结果分析等多个专业智能体,并注重与人类科学直觉的对齐和结果的可解释性。

💡 AutoDS打破传统AI研究助手模式,从目标驱动转向开放式探索,自主生成、测试并迭代科学假设,其核心在于量化和追求“贝叶斯惊喜”,即通过实证证据更新信念的显著变化,从而发现人类可能忽略的科学洞见。

🚀 为高效探索庞大的假设空间,AutoDS采用了蒙特卡洛树搜索(MCTS)技术,能够平衡探索新领域和深入研究有潜力的发现,实验证明在多个科学领域比传统搜索方法更能发现“惊喜”假设。

🧠 AutoDS构建了一个模块化的多智能体LLM架构,每个智能体负责科学发现工作流中的特定环节,如假设生成、实验设计、编程执行和结果分析,并利用LLM文本嵌入和语义相似性检查来确保发现的独特性。

✅ AutoDS在与人类科学家的评估中表现出色,67%的AutoDS认为“惊喜”的假设也得到了领域专家的认可,其贝叶斯惊喜指标比“趣味性”或“效用性”等代理指标更贴近人类的判断,显示出良好的对齐和可解释性。

The Allen Institute for Artificial Intelligence (AI2) has introduced AutoDS (Autonomous Discovery via Surprisal), a groundbreaking prototype engine for open-ended autonomous scientific discovery. Distinct from conventional AI research assistants that depend on human-defined objectives or queries, AutoDS autonomously generates, tests, and iterates on hypotheses by quantifying and seeking out “Bayesian surprise”—a principled measure of genuine discovery, even beyond what humans specifically look for.

From Goal-Driven Inquiry to Open-Ended Exploration

Traditional approaches to autonomous scientific discovery (ASD) typically revolve around answering pre-specified research questions: generate hypotheses relevant to a given problem, then experimentally validate them. AutoDS departs fundamentally from this paradigm. Drawing inspiration from the curiosity-driven exploration of human scientists, AutoDS operates in an open-ended manner—it decides what questions to pose, which hypotheses to pursue, and how to build upon previous results, all without predefined goals.

Open-ended discovery is inherently challenging, requiring mechanisms for both traversing vast hypothesis spaces and prioritizing which hypotheses merit investigation. To address these challenges, AutoDS formalizes the concept of “surprisal”—a measurable shift in belief about a hypothesis before and after acquiring empirical evidence.

Quantifying Bayesian Surprise via Large Language Models

At the core of AutoDS is a novel framework for estimating Bayesian surprise. For each generated hypothesis, state-of-the-art large language models (LLMs)—such as GPT-4o—act as probabilistic observers, eliciting their “belief” about the hypothesis (in the form of probabilities) both before and after empirical testing. These belief distributions, constructed by sampling multiple judgments from the LLM, are modeled with Beta distributions.

To detect meaningful discovery, AutoDS calculates the Kullback-Leibler (KL) divergence between the posterior (after evidence) and prior (before evidence) Beta distributions—a formal measure of Bayesian surprise. Critically, only belief shifts that cross a threshold of evidential change (e.g., from likely true to likely false) are treated as genuinely surprising, focusing the system on substantive discoveries rather than trivial uncertainty updates.

Efficient Hypothesis Search with MCTS

Exploring the vast hypothesis landscape efficiently requires more than naive sampling. AutoDS leverages Monte Carlo Tree Search (MCTS) with progressive widening to guide its search for surprising discoveries. Each node in the search tree represents a hypothesis, and branches correspond to new hypotheses conditioned on prior findings. This structure lets AutoDS maintain a balance between exploring novel avenues and following up on fruitful leads.

Unlike greedy or beam search methods that risk either overcommitting or prematurely pruning, MCTS sustains high discovery efficiency under fixed computation. Empirically, across 21 datasets from domains such as biology, economics, and behavioral science, AutoDS outperforms repeated sampling, greedy, and beam search baselines—discovering 5–29% more hypotheses judged surprising by the LLM.

A Modular Multi-Agent LLM Architecture

AutoDS orchestrates a series of specialized LLM agents, each responsible for a distinct part of the autonomous scientific workflow:

Deduplication of semantically similar hypotheses uses a hierarchical clustering pipeline: LLM-based text embeddings combined with pairwise semantic equivalence checks ensure the final output set comprises only truly distinct discoveries.

Human Alignment and Interpretability

Alignment with human scientific intuition is a key benchmark. In a structured human evaluation (with reviewers holding MS/PhD-level STEM backgrounds), 67% of the hypotheses AutoDS judged surprising were also seen as surprising by domain experts. Furthermore, AutoDS’s Bayesian surprise metric aligned more closely with human judgment than proxy metrics such as predicted “interestingness” or “utility.”

Interestingly, the nature and direction of surprising belief shifts varied by scientific field—highlighting, for example, that confirmatory claims often require stronger evidence to be convincingly surprising than do novel falsifications.

Practical Considerations and Future Outlook

AutoDS exhibits high implementation and experimental validity, with over 98% of evaluated discoveries deemed correctly implemented by human reviewers. While current pipelines depend on API-driven LLMs and thus face latency constraints, the team also explored a “programmatic search” implementation that delivers much faster, albeit less conceptually rich, results.

Although AutoDS is currently a research prototype (with open-sourcing prospectively planned), its architecture and empirical success chart a compelling path for scalable, AI-driven science.

Conclusion

AutoDS represents a significant advance in autonomous scientific reasoning. By transitioning from goal-driven research to autonomous, curiosity-based exploration—and grounding its search in Bayesian surprise—it points the way toward future AI systems capable of complementing, accelerating, or even independently leading scientific discovery.


Check out the Paper, GitHub Page and Blog. All credit for this research goes to the researchers of this project.

Sponsorship Opportunity: Reach the most influential AI developers in US and Europe. 1M+ monthly readers, 500K+ community builders, infinite possibilities. [Explore Sponsorship]

The post Allen Institute for AI-Ai2 Unveils AutoDS: A Bayesian Surprise-Driven Engine for Open-Ended Scientific Discovery appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AutoDS 人工智能 科学发现 贝叶斯惊喜 AI2
相关文章