热点
"HyPO" 相关文章
HyPO: A Hybrid Reinforcement Learning Algorithm that Uses Offline Data for Contrastive-based Preference Optimization and Online Unlabeled Data for KL Regularization
MarkTechPost@AI 2024-07-29T11:04:28.000000Z