热点
"文本反馈" 相关文章
推理时也能做偏好优化,无需额外重训练,来自上海AI Lab港中文等
量子位 2025-02-11T16:25:01.000000Z
推理时也能做偏好优化,无需额外重训练,来自上海AI Lab港中文等
智源社区 2025-02-11T12:37:17.000000Z
Test-Time Preference Optimization: A Novel AI Framework that Optimizes LLM Outputs During Inference with an Iterative Textual Reward Policy
MarkTechPost@AI 2025-01-28T06:35:09.000000Z