MarkTechPost@AI 2024年07月07日
This AI Paper from NYU and Meta AI Introduces LIFT: Length-Instruction Fine-Tuning for Enhanced Control and Quality in Instruction-Following LLMs
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

纽约大学和Meta AI的研究人员提出了一种名为LIFT(长度指令微调)的新方法,旨在解决指令遵循式大模型中存在的长度偏差问题。LIFT通过在训练数据中加入明确的长度指令,使模型能够在推理过程中生成符合指定长度限制的响应。研究人员发现,LIFT-DPO模型在遵循长度约束方面优于现有的最先进模型,例如GPT-4和Llama 3,并同时保持了高响应质量。

🤔 **LIFT方法:**为了解决指令遵循式大模型中存在的长度偏差问题,研究人员提出了LIFT(长度指令微调)方法。LIFT通过在训练数据中加入明确的长度指令,使模型能够在推理过程中生成符合指定长度限制的响应。这种方法通过将长度限制纳入训练过程,提高了模型对长度指令的控制能力,并确保生成的响应符合预期长度。

🤖 **LIFT-DPO模型的优势:**研究结果表明,LIFT-DPO模型在遵循长度约束方面优于现有的最先进模型,例如GPT-4和Llama 3。例如,GPT-4 Turbo模型在遵循长度约束方面表现不佳,违反率高达50%。相比之下,LIFT-DPO模型的违反率显著降低。例如,Llama-2-70B-Base模型在标准DPO训练下,AlpacaEval-LI上的违反率为65.8%,而在LIFT-DPO训练下,该违反率下降至7.1%。

📈 **提升响应质量:**除了能够有效地控制响应长度,LIFT-DPO模型还能够保持高响应质量。研究人员发现,LIFT-DPO模型在遵循长度约束的同时,其胜率也显著提升,表明模型能够在指定长度限制内生成高质量的响应。例如,Llama-2-70B-Base模型在标准DPO训练下的胜率为4.6%,而在LIFT-DPO训练下,该胜率提升至13.6%。

🚀 **未来的应用:**LIFT方法的出现为指令遵循式大模型的开发提供了新的思路,并为解决长度偏差问题提供了有效的解决方案。该方法将有助于开发出能够生成更简洁、高质量的响应的AI模型,并进一步推动AI在各种应用场景中的应用。

🤝 **合作与贡献:**这项研究是纽约大学和Meta AI合作的成果,体现了不同研究机构在AI领域进行跨学科合作的重要性。研究结果表明,通过共同努力,可以开发出更强大、更可靠的AI模型,为人类社会带来更多益处。

Artificial intelligence (AI) has significantly advanced with the development of large language models (LLMs) that follow user instructions. These models aim to provide accurate and relevant responses to human queries, often requiring fine-tuning to enhance their performance in various applications, such as customer service, information retrieval, and content generation. The ability to instruct these models precisely has become a cornerstone of modern AI, pushing the boundaries of what these systems can achieve in practical scenarios.

One of the challenges in developing and evaluating instruction-following models is the inherent length bias. This bias arises because human evaluators and training algorithms favor longer responses, leading to models that generate unnecessarily lengthy outputs. This preference complicates the assessment of model quality and effectiveness, as longer responses are only sometimes more informative or accurate. Consequently, the challenge is to develop models that understand instructions and ensure they can generate responses of appropriate length.

Current methods to address the length bias include incorporating length penalties into evaluation benchmarks. For instance, AlpacaEval and MT-Bench have integrated these penalties to counteract the models’ tendency to produce longer responses. Furthermore, various fine-tuning techniques, such as reinforcement learning with human feedback (RLHF), are employed to optimize models for better instruction-following capabilities. These methods aim to refine the models’ ability to generate concise yet comprehensive responses, balancing the length and quality of the output.

Researchers from Meta FAIR and New York University have introduced a novel approach called Length-Instruction Fine-Tuning (LIFT), which involves augmenting training data with explicit length instructions. This method enables models to be controlled at inference time to adhere to specified length constraints. The research team, including Meta FAIR and New York University members, designed this approach to mitigate the length bias and improve the models’ adherence to length-specific instructions. The models learn to respect these constraints during real-world applications by incorporating detailed instructions into the training data.

The LIFT method incorporates Direct Preference Optimization (DPO) to fine-tune models using datasets enhanced with length instructions. This process starts with augmenting a conventional instruction-following dataset by inserting length constraints into the prompts. The method constructs preference pairs that reflect both length constraints and response quality. These augmented datasets are then used to fine-tune models, such as Llama 2 and Llama 3, ensuring they can handle prompts with and without length instructions. This systematic approach allows the models to learn from various instructions, enhancing their ability to generate accurate and appropriately concise responses.

The proposed LIFT-DPO models demonstrated superior performance in adhering to length constraints compared to existing state-of-the-art models like GPT-4 and Llama 3. For example, the researchers found that the GPT-4 Turbo model violated length constraints almost 50% of the time, highlighting a significant flaw in its design. In contrast, the LIFT-DPO models exhibited significantly lower violation rates. Specifically, the Llama-2-70B-Base model, when subjected to standard DPO training, showed a violation rate of 65.8% on AlpacaEval-LI, which dramatically decreased to 7.1% with LIFT-DPO training. Similarly, the Llama-2-70B-Chat model’s violation rate reduced from 15.1% with standard DPO to 2.7% with LIFT-DPO, demonstrating the method’s effectiveness in controlling response length.

Moreover, the LIFT-DPO models maintained high response quality while adhering to length constraints. The win rates improved significantly, indicating that the models could generate high-quality responses within the specified length limits. For instance, the win rate for the Llama-2-70B-Base model increased from 4.6% with standard DPO to 13.6% with LIFT-DPO. These results underscore the method’s success in balancing length control and response quality, providing a robust solution for length-biased evaluations.

In conclusion, the research addresses the problem of length bias in instruction-following models by introducing the LIFT method. This approach enhances the controllability and quality of model responses by integrating length constraints into the training process. The results indicate that LIFT-DPO models outperform traditional methods, providing a more reliable and effective solution for length-constrained instruction following. The collaboration between Meta FAIR and New York University has significantly improved the development of AI models that can generate concise, high-quality responses, setting a new standard for instruction-following capabilities in AI research.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 46k+ ML SubReddit

The post This AI Paper from NYU and Meta AI Introduces LIFT: Length-Instruction Fine-Tuning for Enhanced Control and Quality in Instruction-Following LLMs appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI 大模型 指令遵循 长度偏差 LIFT Meta AI 纽约大学
相关文章