Tiny Reward Models

cs.AI updates on arXiv.org 07月15日 12:27

Tiny Reward Models

TinyRM是一种参数量仅为4亿的小型双向掩码语言模型，在推理和安全性偏好建模任务上可匹敌参数量超过175倍的模型。TinyRM结合了FLAN式提示、方向性低秩适应和层冻结技术，在RewardBench上取得优异成绩。实验表明，小型模型通过特定领域的微调策略，在推理任务上表现出色。

arXiv:2507.09973v1 Announce Type: cross Abstract: Large decoder-based language models have become the dominant architecture for reward modeling in reinforcement learning from human feedback (RLHF). However, as reward models are increasingly deployed in test-time strategies, their inference costs become a growing concern. We present TinyRM, a family of small, bidirectional masked language models (MLMs) with as few as 400 million parameters, that rival the capabilities of models over 175 times larger on reasoning and safety preference modeling tasks. TinyRM combines FLAN-style prompting, Directional Low-Rank Adaptation (DoRA), and layer freezing to achieve strong performance on RewardBench, despite using significantly fewer resources. Our experiments suggest that small models benefit from domain-specific tuning strategies, particularly in reasoning, where lightweight finetuning methods are especially effective. While challenges remain in building generalist models and conversational preference modeling, our preliminary results highlight the promise of lightweight bidirectional architectures as efficient, scalable alternatives for preference modeling.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

TinyRM 语言模型偏好建模微调资源效率

相关文章

Coalition of news publishers sue Microsoft and OpenAI

When More is More? When For an LLM is Enough?

This AI Paper by Microsoft and Tsinghua University Introduces YOCO: A Decoder-Decoder Architectures for Language Models

OLMo: Everything You Need to Train an Open Source LLM with Akshita Bhagia - #674

Multilingual LLMs and the Values Divide in AI with Sara Hooker - #651

BloombergGPT - an LLM for Finance with David Rosenberg - #639

AI Trends 2023: Reinforcement Learning - RLHF, Robotic Pre-Training, and Offline RL with Sergey Levine - #612

Scaling BERT and GPT for Financial Services with Jennifer Glore - #561

Using Brain Imaging to Improve Neural Networks with Alona Fyshe - #513

Can Language Models Be Too Big? ? with Emily Bender and Margaret Mitchell - #467