SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models

cs.AI updates on arXiv.org 07月25日 12:28

SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models

本文介绍了一种名为SCOPE的评估框架，旨在独立于数据集测量和缓解大型语言模型在多项选择题中的选择偏差，通过多次使用无语义内容的空提示，估计模型的位置偏差分布，并重新分配答案槽位，从而平衡运气率，并防止语义相似的干扰项相邻，提高了LLM评估的公平性和可靠性。

arXiv:2507.18182v1 Announce Type: cross Abstract: Large Language Models (LLMs) can achieve inflated scores on multiple-choice tasks by exploiting inherent biases in option positions or labels, rather than demonstrating genuine understanding. This study introduces SCOPE, an evaluation framework designed to measure and mitigate such selection bias in a dataset-independent manner. By repeatedly invoking a null prompt that lacks semantic content, SCOPE estimates each model's unique position-bias distribution. It then redistributes the answer slot according to the inverse-bias distribution, thereby equalizing the lucky-rate, the probability of selecting the correct answer by chance. Furthermore, it prevents semantically similar distractors from being placed adjacent to the answer, thereby blocking near-miss guesses based on superficial proximity cues. Across multiple benchmark experiments, SCOPE consistently outperformed existing debiasing methods in terms of stable performance improvements and showed clearer confidence distributions over correct options. This framework thus offers a new standard for enhancing the fairness and reliability of LLM evaluations.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

大型语言模型选择偏差评估框架 SCOPE LLM评估

相关文章

Is Claude 3 Outperforming GPT-4?

Harmonizing AI: Crafting Personalized Song Suggestions

AI News Weekly - Issue #377: Next in AI : Pioneers' Predictions! - Mar 21st 2024

COLLAGE: A New Machine Learning Approach to Deal with Floating-Point Errors in Low-Precision to Make LLM Training Accurate and Efficient

Leveraging Linguistic Expertise in NLP: A Deep Dive into RELIES and Its Impact on Large Language Models

Japanese Researchers Release “Fugaku-LLM” Trained on the Fugaku Supercomputer

Teaching Large Language Models to Reason with Reinforcement Learning with Alex Havrilla - #680

Deep Learning, Transformers, and the Consequences of Scale with Oriol Vinyals - #546

AI Gateway Provider Portkey.ai Is In Partnership With F5

Powering AI with the World's Largest Computer Chip with Joel Hestness - #684