cs.AI updates on arXiv.org 07月22日 12:34
HEPPO-GAE: Hardware-Efficient Proximal Policy Optimization with Generalized Advantage Estimation
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍HEPPO-GAE,一款基于FPGA的加速器,用于优化PPO算法中的GAE阶段。通过并行、流水线架构和标准化技术,实现学习稳定性、性能提升和内存使用优化,大幅提高PPO训练效率。

arXiv:2501.12703v2 Announce Type: replace-cross Abstract: This paper introduces HEPPO-GAE, an FPGA-based accelerator designed to optimize the Generalized Advantage Estimation (GAE) stage in Proximal Policy Optimization (PPO). Unlike previous approaches that focused on trajectory collection and actor-critic updates, HEPPO-GAE addresses GAE's computational demands with a parallel, pipelined architecture implemented on a single System-on-Chip (SoC). This design allows for the adaptation of various hardware accelerators tailored for different PPO phases. A key innovation is our strategic standardization technique, which combines dynamic reward standardization and block standardization for values, followed by 8-bit uniform quantization. This method stabilizes learning, enhances performance, and manages memory bottlenecks, achieving a 4x reduction in memory usage and a 1.5x increase in cumulative rewards. We propose a solution on a single SoC device with programmable logic and embedded processors, delivering throughput orders of magnitude higher than traditional CPU-GPU systems. Our single-chip solution minimizes communication latency and throughput bottlenecks, significantly boosting PPO training efficiency. Experimental results show a 30% increase in PPO speed and a substantial reduction in memory access time, underscoring HEPPO-GAE's potential for broad applicability in hardware-efficient reinforcement learning algorithms.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

HEPPO-GAE FPGA加速器 PPO算法 GAE优化 硬件加速
相关文章