MarkTechPost@AI 02月08日
Unraveling Direct Alignment Algorithms: A Comparative Study on Optimization Strategies for LLM Alignment
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文深入探讨了大型语言模型(LLM)对齐中直接对齐算法(DAA)的优化策略。传统LLM对齐方法步骤繁琐,依赖奖励模型,计算成本高昂。DAA旨在绕过强化学习和奖励模型,直接优化模型以符合人类偏好。研究人员通过引入监督微调(SFT)阶段和缩放参数(β)改进了单阶段DAA,如ORPO和ASFT,使其性能与DPO等两阶段方法相媲美。实验结果表明,基于成对比较的DAA优于基于逐点偏好的DAA,并且调整β值可以显著提高性能。该研究为未来改进LLM对齐技术提供了基础。

🎯 直接对齐算法(DAA)旨在简化大型语言模型(LLM)的对齐过程,通过直接优化模型来符合人类价值观,无需依赖奖励建模或强化学习。

📈 通过在DAA中加入独立的监督微调(SFT)阶段,并引入缩放参数(β),可以显著提高ORPO和ASFT等单阶段DAA的性能,使其与DPO等两阶段方法相媲美。缩放参数β用于调整偏好更新的强度,从而更好地控制优化过程。

📊 实验结果表明,基于成对比较的DAA优于基于逐点偏好的DAA,这意味着在对齐质量方面,结构化的排序信号更为有效。使用Llama 3.1 8B在UltraChat和UF数据集上进行的测试表明,ORPO的性能与DPO和ASFT相当,并且通过β调整可以显著提高性能。

🔬 研究人员通过修改ASFT和ORPO的损失函数,以隐式方式包含SFT,使其能够适应单阶段和两阶段配置。这种方法为改进模型对齐提供了一种结构化的方法,并为未来的研究奠定了基础,可以将其推广到更大的模型和更多样化的数据集,以优化对齐技术。

Aligning large language models (LLMs) with human values remains difficult due to unclear goals, weak training signals, and the complexity of human intent. Direct Alignment Algorithms (DAAs) offer a way to simplify this process by optimizing models directly without relying on reward modeling or reinforcement learning. These algorithms use different ranking methods, such as comparing pairs of outputs or scoring individual responses. Some versions also require an extra fine-tuning step, while others do not. There are further complications in understanding how effective they are and which approach is best because of differences in how rewards are defined and applied.

Currently, methods for aligning large language models (LLMs) follow multiple steps, including supervised fine-tuning (SFT), reward modeling, and reinforcement learning. These methods introduce challenges due to their complexity, dependence on reward models, and high computational cost. DAAs try to optimize models from human preferences directly, bypassing reinforcement learning and reward modeling. Different forms of DAAs may vary in their optimization method, loss functions, and fine-tuning method. Despite their potential to simplify alignment, inconsistencies in ranking methods, reward calculations, and training strategies create further difficulties in evaluating their effectiveness.

To improve single-stage direct alignment algorithms (DAAs) like ORPO and ASFT, researchers proposed adding a separate supervised fine-tuning (SFT) phase and introducing a scaling parameter (β). These methods were originally not provided with a β parameter and did alignment directly. As such, they were less effective. Including an explicit SFT phase and letting β control preference scaling gives these methods performance comparable to two-stage approaches such as DPO. The main distinction between different DAAs lies in whether they use an odds ratio or a reference policy ratio, which affects how alignment is optimized.

The framework modifies the loss functions of ASFT and ORPO to include SFT in an implicit way, making them adaptable to single-stage and two-stage configurations. The scaling parameter β is used to adjust the strength of preference updates toward better control in optimization. Experimental analysis suggests that DAAs relying on pairwise comparisons outperform those relying on pointwise preferences, thus warranting structured ranking signals in alignment quality.

Researchers evaluated Direct Alignment Algorithms (DAA) using Llama 3.1 8B on UltraChat and UF datasets, testing on AlpacaEval 2 and ArenaHard, while Llama 3.2 3B was used for Reddit TL; DR. Supervised fine-tuning (SFT) on UF improved ORPO and ASFT alignment. ORPO performed on par with DPO and ASFT, achieving a +2.04% increase in ArenaHard win rate but still lagging behind ORPO. β tuning significantly enhanced performance, yielding improvements of +7.0 and +43.4 in GPT-4 win rate for TL;DR and +3.46 and +8.27 in UF AlpacaEval 2 LC win rate. Comparative analysis of DPO, IPO, SimPO, and other alignment methods showed that β adjustments in LβASFTAlign and LβORPOAlign improved preference optimization, demonstrating that SFT-trained models performed best when incorporating LAlign components.

In the end, the proposed method improved Direct Alignment Algorithms (DAAs) by incorporating a supervised fine-tuning (SFT) phase. This led to consistent performance gains and significantly enhanced ORPO and ASFT. Even though the evaluation was conducted on specific datasets and model sizes, the findings provide a structured approach for improving model alignment. This method is a foundation to be used as a basis for future research. It can be extrapolated to other larger models with more diverse data sets to refine alignment techniques through optimization strategies that identify factors in alignment quality.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 75k+ ML SubReddit.

Recommended Open-Source AI Platform: ‘IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System’ (Promoted)

The post Unraveling Direct Alignment Algorithms: A Comparative Study on Optimization Strategies for LLM Alignment appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM对齐 直接对齐算法 监督微调 优化策略
相关文章