热点
"元奖励" 相关文章
Meta-Rewarding LLMs: A Self-Improving Alignment Technique Where the LLM Judges Its Own Judgements and Uses the Feedback to Improve Its Judgment Skills
MarkTechPost@AI 2024-08-08T06:34:49.000000Z
4轮暴训,Llama 7B击败GPT-4!Meta等让LLM「分饰三角」自评自进化
智源社区 2024-08-01T08:07:00.000000Z
4轮暴训,Llama 7B击败GPT-4,Meta等让LLM“分饰三角”自评自进化
36kr 2024-08-01T00:18:04.000000Z