Marek Rosa - Goodai Blog 2024年11月26日
LTM Benchmark: Improvements and new reports
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

GoodAI致力于开发具备持续终身学习能力的智能体,并为此开源了GoodAI LTM基准测试。该基准测试通过长时间的对话,评估智能体长期记忆能力,并通过交织不同任务的信息和问题,模拟真实场景。GoodAI不断改进基准测试,引入新功能和任务,以推动智能体长期记忆能力的发展,并使其更具现实意义。测试注重自然对话,避免浪费资源,旨在帮助智能体更好地理解和处理信息,最终目标是开发出能够自主学习和解决各种问题的通用人工智能。

🤔 GoodAI致力于开发具备持续终身学习能力的智能体,并为此开源了GoodAI LTM基准测试,旨在评估智能体长期记忆能力(LTM)。

🔄 该基准测试通过一个非常长的对话进行,在对话中交织不同任务的信息和问题,以模拟真实场景,并评估智能体是否能记住之前的信息。

🚀 GoodAI LTM团队将基准测试视为一个不断移动的目标,通过引入新的任务和功能,持续提升测试难度,推动智能体长期记忆能力的发展。

💡 GoodAI 团队不断优化基准测试,使其更具现实意义,并尽量减少资源浪费,同时确保对话的自然流畅,让智能体能够更好地理解和处理信息。

🤖 GoodAI的最终目标是开发出能够自主学习和解决各种问题的通用人工智能(AGI),例如“AI People”游戏中的NPC,这些NPC具备长期记忆和不断发展的个性。

At GoodAI, we are committed to developing agents that are capable of continual and life-long learning. As part of our efforts, we have previously open-sourced the GoodAI LTM Benchmark, a suite of tests aimed to evaluate the Long-Term Memory (LTM) abilities of any conversational agent. In this benchmark, all tasks take place as part of one single very long conversation between the agent and our virtual tester. The benchmark interleaves information and probing questions from different tasks, albeit taking special care of weaving them together into a natural conversation.

LTM = Long-Term Memory

As a direct consequence of our research in agents with LTM, the GoodAI LTM Benchmark is in constant evolution. To us it represents an invaluable tool for evaluating our agents and validating our hypotheses. Additionally, it helps us characterize the ways in which the distinct agents fail and therefore it provides us goals to aim for. In the GoodAI LTM team we regard the GoodAI LTM Benchmark as a moving goal post, and by introducing new tasks and features we are continuously pushing that goal post away, because what is a goal post worth if it is easy to reach?


New features

With every new feature, we try to make the GoodAI LTM Benchmark not only more and more challenging, but also more realistic. The thing about benchmarking LTM is that you need your tests to be long, very long. So you either introduce a ton of dummy interactions for the sole sake of filling up the conversation, and accept that all those tokens are wasted resources, or you start interleaving the tasks and weave them into a seamless and natural conversation (like we do). We are always doing our best to minimize the amount of wasted tokens, whilst keeping the conversation natural and making sure that the agent can follow along.

For more details, continue to GoodAI Blog Post


Thank you for reading this blog!

 

Best,
Marek Rosa
CEO, Creative Director, Founder at Keen Software House
CEO, CTO, Founder at GoodAI

 

For more news:
GoodAI Discord: https://discord.gg/Pfzs7WWJwf
Space Engineers: www.SpaceEngineersGame.com
Keen Software House: www.keenswh.com
VRAGE Engine: www.keenswh.com/vrage/
GoodAI: www.GoodAI.com
Personal Blog: blog.marekrosa.org

 

Personal bio:

Marek Rosa is the founder and CEO of GoodAI, a general artificial intelligence R&D company, and Keen Software House, an independent game development studio, started in 2010, and best known for its best-seller Space Engineers (over 5 million copies sold). Space Engineers has the 4th largest Workshop on Steam with over 500K mods, ships, stations, worlds, and more!

Marek has been interested in game development and artificial intelligence since childhood. He started his career as a programmer and later transitioned to a leadership role. After the success of Keen Software House titles, Marek was able to fund GoodAI in 2014 with a $10 Million personal investment.

Both companies now have over 100 engineers, researchers, artists, and game developers.

Marek's primary focus includes Space Engineers, the VRAGE3 engine, the AI People game, long-term memory systems (LTM), an LLM-powered personal assistant with LTM named Charlie Mnemonic, and the Groundstation.

GoodAI's mission is to develop AGI - as fast as possible - to help humanity and understand the universe. One of the commercial stepping stones is the "AI People" game, which features LLM-driven AI NPCs. These NPCs are grounded in the game world, interacting dynamically with the game environment and with other NPCs, and they possess long-term memory and developing personalities. GoodAI also works on autonomous agents that can self-improve and solve any task that a human can.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

GoodAI 长期记忆 人工智能 基准测试 AGI
相关文章