偏好优化_Fishai

热点

"偏好优化" 相关文章

MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

cs.AI updates on arXiv.org 2025-07-30T04:46:09.000000Z

SGPO: Self-Generated Preference Optimization based on Self-Improver

cs.AI updates on arXiv.org 2025-07-29T04:22:18.000000Z

Sem-DPO: Mitigating Semantic Inconsistency in Preference Optimization for Prompt Engineering

cs.AI updates on arXiv.org 2025-07-29T04:22:16.000000Z

ICCV 2025｜UV-CoT：无监督视觉推理新突破，偏好优化重塑图像级思维链

机器之心 2025-07-28T10:56:25.000000Z

Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process

cs.AI updates on arXiv.org 2025-07-15T04:24:28.000000Z

CheXPO: Preference Optimization for Chest X-ray VLMs with Counterfactual Rationale

cs.AI updates on arXiv.org 2025-07-10T04:05:57.000000Z

Data Diversification Methods In Alignment Enhance Math Performance In LLMs

cs.AI updates on arXiv.org 2025-07-04T04:08:19.000000Z

强化学习不再靠奖励？组合优化迎来“偏好驱动”新框架

掘金人工智能 2025-05-28T09:28:03.000000Z

Snowflake Proposes ExCoT: A Novel AI Framework that Iteratively Optimizes Open-Source LLMs by Combining CoT Reasoning with off-Policy and on-Policy DPO, Relying Solely on Execution Accuracy as Feedback

MarkTechPost@AI 2025-04-03T07:40:26.000000Z

给语音模型戴上「眼镜」，错误率降低12.5%，人大CMU最新开源

36kr 2025-03-24T09:28:46.000000Z

DPO-Shift：一个参数可控改变DPO分布，缓解似然偏移

机器之心 2025-03-04T05:11:52.000000Z

This AI Paper from Meta Introduces Diverse Preference Optimization (DivPO): A Novel Optimization Method for Enhancing Diversity in Large Language Models

MarkTechPost@AI 2025-02-03T18:04:58.000000Z

Meta AI Proposes EvalPlanner: A Preference Optimization Algorithm for Thinking-LLM-as-a-Judge

MarkTechPost@AI 2025-01-31T06:32:59.000000Z

Test-Time Preference Optimization: A Novel AI Framework that Optimizes LLM Outputs During Inference with an Iterative Textual Reward Policy

MarkTechPost@AI 2025-01-28T06:35:09.000000Z

o1也会「想太多」？腾讯AI Lab与上海交大揭秘o1模型过度思考问题

36kr-科技 2025-01-08T11:22:15.000000Z

社区供稿 | 引入隐式模型融合技术，中山大学团队推出 FuseChat-3.0

魔搭ModelScope社区 2024-12-19T13:24:17.000000Z

【NLP】万字长文梳理LLM+RL(HF)的脉络

机器学习初学者 2024-10-23T07:12:51.000000Z

LongAlign: A Segment-Level Encoding Method to Enhance Long-Text to Image Generation

MarkTechPost@AI 2024-10-22T05:51:02.000000Z

HyPO: A Hybrid Reinforcement Learning Algorithm that Uses Offline Data for Contrastive-based Preference Optimization and Online Unlabeled Data for KL Regularization

MarkTechPost@AI 2024-07-29T11:04:28.000000Z

为视觉语言多模态模型进行偏好优化

智源社区 2024-07-17T05:06:39.000000Z

Copyright © 2019 FISHAI.All Rights Reserved