MarkTechPost@AI 2024年07月18日
Sibyl: An AI Agent Framework Designed to Enhance the Capabilities of LLMs in Complex Reasoning Tasks
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Sibyl 是一个由 Baichuan Inc. 和天津大学智能与计算学院的研究人员开发的 AI 代理框架,旨在解决大型语言模型 (LLM) 在复杂推理任务中面临的挑战。它通过引入工具规划器、外部信息获取通道、多代理辩论陪审团和全局工作区等模块,提高了 LLM 的推理能力,并能有效地管理上下文信息。

🤔 Sibyl 框架的核心创新在于其外部信息获取通道,该通道使用自定义表示语言有效地压缩和处理信息。这种方法使 Sibyl 能够专注于相关细节,节省上下文长度,并实现更长的推理步骤。

🤖 Sibyl 框架采用功能编程原则,强调可重用性和无状态性。它使用 QA 函数而不是对话来进行内部 LLM 推理请求,使其能够独立运行而无需持久状态。这种方法简化了框架的结构,并便于调试和增强。

🏆 Sibyl 在 GAIA 基准测试集中取得了最先进的性能,特别是在具有挑战性的场景中。这表明 Sibyl 在解决复杂推理任务方面具有改进的能力,并有可能推动基于 LLM 的应用程序朝着更加深思熟虑的系统 2 思维方向发展。

🌐 Sibyl 采用了一种以人为本的浏览器界面,而不是检索增强生成,从而在数据访问中保留了更多上下文和深度。它使用无状态、可重入的 QA 函数而不是对话,简化了系统架构,并便于维护。

🧠 Sibyl 强调增强长期记忆、规划和错误纠正的能力。它包含一个由所有模块共享的全局工作区,使用增量状态表示语言存储信息。这可以选择性地压缩过去事件,只添加相关的增量信息。该框架还包括规划和自我校正机制,根据当前进度评估总结工具结果并规划后续步骤。

🗣️ Sibyl 的“陪审团”机制利用多代理辩论格式进行自我批评和校正,有效地利用存储在全局工作区中的信息来完善响应,确保准确地解决问题。

📈 实验结果表明,Sibyl 在 GAIA 基准测试集中表现出色,特别是在具有挑战性的第 2 级和第 3 级场景中。Sibyl 优于其他模型,包括带有和不带插件的 GPT-4、AutoGPT-4、AutoGen 和 FRIDAY。在测试集中,Sibyl 的总体准确率为 34.55%,而 AutoGen 为 32.33%,FRIDAY 为 24.25%。在更复杂的场景中,性能差距进一步扩大,突出表明 Sibyl 增强了在复杂推理过程中减轻错误传播的能力。

📈 Sibyl 还表现出优异的泛化能力,从验证集到测试集的准确率下降幅度较小(40.00% 降至 34.55%),而 AutoGen(39.39% 降至 32.33%)和 FRIDAY(34.55% 降至 24.25%)则下降幅度较大。在效率方面,Sibyl 在正确解决问题时始终优于人类,在所有难度级别上使用的步骤明显更少。尽管 Sibyl 被限制在 20 个推理步骤内,但它表现出很高的推理效率,表明它具有强大的能力来减轻不必要的推理并抑制错误传播。这些结果突出了 Sibyl 在推动基于 LLM 的代理朝着在复杂场景中更深思熟虑、更高效地解决问题方向发展的潜力。

🚀 Sibyl 代表着基于 LLM 的代理框架的重大进步,旨在增强复杂推理能力。通过整合模块化设计、外部信息获取和陪审团机制,Sibyl 有望推动 AI 代理在现实世界场景中的应用,并为实现更具通用性和适应性的 AI 系统铺平道路。

Large language models (LLMs) have revolutionized human-computer interaction but face challenges in complex real-world scenarios requiring extensive reasoning. LLM-based agents struggle with lengthy reasoning chains, leading to error propagation and reduced accuracy. Existing systems’ complexity hinders practical deployment and scalability. Also, long-context management poses a significant challenge, with a gap between claimed and effective context lengths LLMs can handle. The “context dilution” problem further complicates information integration from diverse sources. These challenges underscore the need for a simpler approach that enhances reasoning capabilities while improving context management, ensuring LLMs maintain focus on relevant information without being overwhelmed by data volume.

Recent advancements in AI have led to the integration of LLMs into autonomous agents, pushing towards Artificial General Intelligence (AGI). These LLM-based agents have shown promise in various domains, including mathematical problem-solving, coding, role-playing, and social simulation. Open-source communities have developed frameworks like Langchain, BabyAGI, and AutoGPT to create more versatile agents capable of handling general tasks. While these agents perform well in straightforward scenarios, they struggle with complex real-world challenges. This limitation highlights the need for further improvements in general-purpose LLM-based agents to effectively address more intricate problems and bridge the gap between specialized and truly versatile AI systems.

Researchers from Baichuan Inc. and the College of Intelligence and Computing, Tianjin University, introduce Sibyl, a robust LLM-based agent framework designed to tackle complex reasoning tasks. It comprises four main modules: a tool planner, an external information acquisition channel, a multi-agent debate-based jury, and a global workspace. The key innovation lies in the external information acquisition channel, which efficiently compresses and processes information using a custom representation language. This approach allows Sibyl to focus on relevant details, conserve context length, and enable extended reasoning steps. The framework also incorporates a global workspace for seamless information sharing and a jury for self-refinement before final responses.

Sibyl’s design is rooted in functional programming principles, emphasizing reusability and statelessness. It uses QA functions instead of dialogues in internal LLM inference requests, allowing independent operation without persistent states. This approach simplifies the framework’s structure and facilitates debugging and enhancement. Experimental results on the GAIA benchmark test set demonstrate Sibyl’s state-of-the-art performance, particularly in challenging scenarios. This underscores Sibyl’s improved capability in solving complex reasoning tasks and its potential to advance LLM-based applications towards more deliberate, System-2 thinking.

The Sibyl framework is built on a design philosophy that aims to reduce complexity while enhancing the capabilities of LLM-based agents. It employs a human-oriented browser interface instead of Retrieval Augmented Generation, preserving more context and depth in data access. Sibyl uses a stateless, reentrant QA function rather than dialogues, simplifying the system architecture and facilitating easier maintenance. The framework centralizes its functionalities around two primary tools: a Web browser and Python environments, aligning the browser’s interface more closely with human interaction modes.

Sibyl emphasizes enhancing capabilities for long-term memory, planning, and error correction. It incorporates a global workspace shared by all modules, storing information with an incremental state-based representation language. This selectively compresses past events, adding only relevant information increments. The framework also includes planning and self-correction mechanisms, summarizing tool outcomes and planning subsequent steps based on current progress assessment. A “Jury” mechanism utilizing a multi-agent debate format enables self-critique and correction, efficiently using information stored in the global workspace to refine responses and ensure accurate problem-solving.

The experimental results demonstrate Sibyl’s superior performance on the GAIA benchmark test set, particularly in challenging Level 2 and Level 3 scenarios. Sibyl outperformed other models, including GPT-4 with and without plugins, AutoGPT-4, AutoGen, and FRIDAY. On the test set, Sibyl achieved an overall accuracy of 34.55%, compared to 32.33% for AutoGen and 24.25% for FRIDAY. The performance gap widened in more complex scenarios, highlighting Sibyl’s enhanced ability to mitigate error propagation in complex reasoning processes.

Sibyl also exhibited superior generalization capabilities, with a smaller decline in accuracy from validation to test set (40.00% to 34.55%) compared to AutoGen (39.39% to 32.33%) and FRIDAY (34.55% to 24.25%). In terms of efficiency, Sibyl consistently outperformed humans when solving problems correctly, using significantly fewer steps across all difficulty levels. Despite being limited to 20 reasoning steps, Sibyl demonstrated high reasoning efficiency, indicating a strong capability to mitigate unnecessary reasoning and suppress error propagation. These results underscore Sibyl’s potential in advancing LLM-based agents towards more deliberate and efficient problem-solving in complex scenarios.

Sibyl represents a significant advancement in LLM-based agent frameworks, designed to enhance complex reasoning capabilities. By incorporating a modular design and a global workspace for efficient information sharing and collaboration, Sibyl facilitates the transition from rapid, intuitive System-1 thinking to slower, more deliberate System-2 thinking in LLM-based agents. Experimental results on the GAIA benchmark demonstrate Sibyl’s superiority over existing state-of-the-art solutions, particularly when instantiated with GPT-4. This performance underscores the effectiveness of Sibyl’s innovative approach in addressing complex real-world tasks. As AI continues to evolve, Sibyl’s framework offers a promising path towards developing more capable and versatile LLM applications, potentially bridging the gap between current AI capabilities and the requirements of intricate, multi-step reasoning processes in real-world scenarios.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 46k+ ML SubReddit

The post Sibyl: An AI Agent Framework Designed to Enhance the Capabilities of LLMs in Complex Reasoning Tasks appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI 代理 LLM 推理 上下文管理 Sibyl 框架
相关文章