热点
"在线蒸馏" 相关文章
Researchers at Google Deepmind Introduce BOND: A Novel RLHF Method that Fine-Tunes the Policy via Online Distillation of the Best-of-N Sampling Distribution
MarkTechPost@AI 2024-07-24T17:19:11.000000Z