Board Game Arena: A Framework and Benchmark for Assessing Large Language Models via Strategic Play

cs.AI updates on arXiv.org 7小时前

Board Game Arena: A Framework and Benchmark for Assessing Large Language Models via Strategic Play

本文介绍了Board Game Arena库，该库通过战略棋盘游戏评估大型语言模型的决策能力，实现不同类型代理的系统比较，并支持分布式执行，为LLM推理和博弈论行为实证评估做出贡献。

arXiv:2508.03368v1 Announce Type: new Abstract: The Board Game Arena library provides a framework for evaluating the decision making abilities of large language models (LLMs) through strategic board games implemented in Google OpenSpiel library. The framework enables systematic comparisons between LLM based agents and other agents (random, human, reinforcement learning agents, etc.) in various game scenarios by wrapping multiple board and matrix games and supporting different agent types. It integrates API access to models via LiteLLM, local model deployment via vLLM, and offers distributed execution through Ray. Additionally it provides extensive analysis tools for the LLM reasoning traces. This paper summarizes the structure, key characteristics, and motivation of the repository, highlighting how it contributes to the empirical evaluation of the reasoning of LLM and game-theoretic behavior

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

大型语言模型决策能力 Board Game Arena 博弈论 LLM推理

相关文章

Is Claude 3 Outperforming GPT-4?

Harmonizing AI: Crafting Personalized Song Suggestions

AI News Weekly - Issue #377: Next in AI : Pioneers' Predictions! - Mar 21st 2024

COLLAGE: A New Machine Learning Approach to Deal with Floating-Point Errors in Low-Precision to Make LLM Training Accurate and Efficient

Leveraging Linguistic Expertise in NLP: A Deep Dive into RELIES and Its Impact on Large Language Models

Japanese Researchers Release “Fugaku-LLM” Trained on the Fugaku Supercomputer

Teaching Large Language Models to Reason with Reinforcement Learning with Alex Havrilla - #680

Training Data Locality and Chain-of-Thought Reasoning in LLMs with Ben Prystawski - #673

Deep Learning, Transformers, and the Consequences of Scale with Oriol Vinyals - #546

AI Gateway Provider Portkey.ai Is In Partnership With F5