LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences

cs.AI updates on arXiv.org 07月28日 12:42

LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences

本文介绍了一种名为LOTUS的排行榜，旨在评估详细图像描述的质量、风险和社会偏见，并通过适应不同用户偏好进行偏好导向评估，揭示模型在不同标准下的表现差异。

arXiv:2507.19362v1 Announce Type: cross Abstract: Large Vision-Language Models (LVLMs) have transformed image captioning, shifting from concise captions to detailed descriptions. We introduce LOTUS, a leaderboard for evaluating detailed captions, addressing three main gaps in existing evaluations: lack of standardized criteria, bias-aware assessments, and user preference considerations. LOTUS comprehensively evaluates various aspects, including caption quality (e.g., alignment, descriptiveness), risks (\eg, hallucination), and societal biases (e.g., gender bias) while enabling preference-oriented evaluations by tailoring criteria to diverse user preferences. Our analysis of recent LVLMs reveals no single model excels across all criteria, while correlations emerge between caption detail and bias risks. Preference-oriented evaluations demonstrate that optimal model selection depends on user priorities.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

图像描述评估标准用户偏好社会偏见大型视觉语言模型

相关文章

Accessibility and Computer Vision - #425

Aligning Large Language Models with Diverse User Preferences Using Multifaceted System Messages: The JANUS Approach

普通人生活中的深圳：工资低物价低

InternLM-XComposer-2.5 (IXC-2.5): A Versatile Large-Vision Language Model that Supports Long-Contextual Input and Output

遇到相亲蹭饭男，真让人无语下头。所以在这种事儿上，别总说女的蹭吃，男的也有份。呸呸！

MACAROON: Enhancing the Proactive Conversation Abilities of Large Vision-Language Models LVLMs

“胎儿查出超雄该不该留”为何引发网友热议？

当中年妈妈开始追星：我要快乐，我很正常

被“嫌弃”的职高生，挣扎着向前

不敢用ChatGPT水论文了！OpenAI反作弊工具曝光，准确度高达99.9%，好消息：还没上线