Principled Foundations for Preference Optimization

cs.AI updates on arXiv.org 07月15日 12:24

Principled Foundations for Preference Optimization

本文揭示了直接偏好优化（DPO）与损失函数（Savage）和随机选择（Doignon-Falmagne和Machina）之间的联系，为Savage损失函数提供支持，并扩展了DPO设置，包括边际和长度校正，对理解DPO运作至关重要。

arXiv:2507.07855v1 Announce Type: cross Abstract: In this paper, we show that direct preference optimization (DPO) is a very specific form of a connection between two major theories in the ML context of learning from preferences: loss functions (Savage) and stochastic choice (Doignon-Falmagne and Machina). The connection is established for all of Savage's losses and at this level of generality, (i) it includes support for abstention on the choice theory side, (ii) it includes support for non-convex objectives on the ML side, and (iii) it allows to frame for free some notable extensions of the DPO setting, including margins and corrections for length. Getting to understand how DPO operates from a general principled perspective is crucial because of the huge and diverse application landscape of models, because of the current momentum around DPO, but also -- and importantly -- because many state of the art variations on DPO definitely occupy a small region of the map that we cover. It also helps to understand the pitfalls of departing from this map, and figure out workarounds.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

直接偏好优化机器学习理论关联损失函数随机选择

相关文章

How bad a future do ML researchers expect?

Accelerating ML application development: Production-ready Airflow integrations with critical AI tools

Weka Makes Life Simpler for Developers, Engineers, and Architects

PostgresML: Streamlining AI Model Deployment With PostgreSQL Integration

Harmonizing AI: Crafting Personalized Song Suggestions

Analyzing the Impact of Flash Attention on Numeric Deviation and Training Stability in Large-Scale Machine Learning Models

Learn AI Together — Towards AI Community Newsletter #23

Top Important LLM Papers for the Week from 29/04 to 05/05

K-Means From Scratch: How The Cluster Magic Works

Unpacking and Utilizing Vertex with Google Earth Engine for Machine Learning.