AgentPS: Agentic Process Supervision for Content Moderation with Multimodal LLMs

cs.AI updates on arXiv.org 07月08日 13:53

AgentPS: Agentic Process Supervision for Content Moderation with Multimodal LLMs

本文提出AgentPS框架，通过在微调过程中对辅助问题进行推理，解决多模态大语言模型在处理复杂逻辑结构时的困难，并展示其在公共和私有数据集上的性能提升。

arXiv:2412.15251v2 Announce Type: replace-cross Abstract: The advanced processing and reasoning capabilities of multimodal large language models (MLLMs) have driven substantial progress in vision-language (VL) understanding tasks. However, while effective for tasks governed by straightforward logic, MLLMs often struggle with reasoning complex, detail-intensive logical structures. To address this limitation, we introduce AgentPS, a novel framework that integrates Agentic Process Supervision into MLLMs by sequentially reasoning over ancillary questions during fine-tuning. AgentPS achieves substantial improvements over baseline MLLMs on both public benchmarks and proprietary datasets. Notably, we show that using MLLM-generated ancillary labels in place of human annotations yields only minimal performance degradation, highlighting the method's scalability. These results establish AgentPS as a scalable and effective solution for complex multimodal classification in large-scale industrial applications.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

多模态大语言模型复杂逻辑推理 AgentPS框架性能提升

相关文章

Show HN: 开源 LLM 补丁流 - 速度和输出令牌改进

Rivian 更新 R1，采用新型电机和电池组，提高了性能，降低了成本

Solana: ↩️ @vohvohh

Intel：正式发布第二代酷睿Ultra处理器架构

重要科學運算函式庫NumPy經多年開發迎來2.0重大更新

号称提升100倍的CPU设计，真相究竟是什么

苹果 iOS 18 助力 iPhone 15 Pro Max 机器学习测试得分提高 25%

Salesforce AI Unveils SFR-Embedding-v2: Reclaiming Top Spot on HuggingFace MTEB Benchmark with Advanced Multitasking and Enhanced Performance in AI

零下78℃全网首发！“骁龙8Gen2”极限超频49%！能干翻8Gen3？甚至比肩M1吗？【小鹏HiTech】

探秘华为 HDC2024！原生鸿蒙到底怎么样？？？