热点
"偏好优化" 相关文章
强化学习不再靠奖励?组合优化迎来“偏好驱动”新框架
掘金 人工智能 2025-05-28T09:28:03.000000Z
Snowflake Proposes ExCoT: A Novel AI Framework that Iteratively Optimizes Open-Source LLMs by Combining CoT Reasoning with off-Policy and on-Policy DPO, Relying Solely on Execution Accuracy as Feedback
MarkTechPost@AI 2025-04-03T07:40:26.000000Z
给语音模型戴上「眼镜」,错误率降低12.5%,人大CMU最新开源
36kr 2025-03-24T09:28:46.000000Z
DPO-Shift:一个参数可控改变DPO分布,缓解似然偏移
机器之心 2025-03-04T05:11:52.000000Z
This AI Paper from Meta Introduces Diverse Preference Optimization (DivPO): A Novel Optimization Method for Enhancing Diversity in Large Language Models
MarkTechPost@AI 2025-02-03T18:04:58.000000Z
Meta AI Proposes EvalPlanner: A Preference Optimization Algorithm for Thinking-LLM-as-a-Judge
MarkTechPost@AI 2025-01-31T06:32:59.000000Z
Test-Time Preference Optimization: A Novel AI Framework that Optimizes LLM Outputs During Inference with an Iterative Textual Reward Policy
MarkTechPost@AI 2025-01-28T06:35:09.000000Z
o1也会「想太多」?腾讯AI Lab与上海交大揭秘o1模型过度思考问题
36kr-科技 2025-01-08T11:22:15.000000Z
社区供稿 | 引入隐式模型融合技术,中山大学团队推出 FuseChat-3.0
魔搭ModelScope社区 2024-12-19T13:24:17.000000Z
【NLP】万字长文梳理LLM+RL(HF)的脉络
机器学习初学者 2024-10-23T07:12:51.000000Z
LongAlign: A Segment-Level Encoding Method to Enhance Long-Text to Image Generation
MarkTechPost@AI 2024-10-22T05:51:02.000000Z
HyPO: A Hybrid Reinforcement Learning Algorithm that Uses Offline Data for Contrastive-based Preference Optimization and Online Unlabeled Data for KL Regularization
MarkTechPost@AI 2024-07-29T11:04:28.000000Z
为视觉语言多模态模型进行偏好优化
智源社区 2024-07-17T05:06:39.000000Z
This AI Paper from Cohere for AI Presents a Comprehensive Study on Multilingual Preference Optimization
MarkTechPost@AI 2024-07-08T16:46:21.000000Z
MaPO: The Memory-Friendly Maestro – A New Standard for Aligning Generative Models with Diverse Preferences
MarkTechPost@AI 2024-06-22T12:01:45.000000Z