cs.AI updates on arXiv.org 前天 12:08
TrIM, Triangular Input Movement Systolic Array for Convolutional Neural Networks: Dataflow and Analytical Modelling
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出了一种基于三角形输入移动的Systolic Arrays数据流TrIM,旨在解决卷积神经网络中的数据冗余问题,提高数据利用率,降低内存访问,提升计算效率。

arXiv:2408.01254v3 Announce Type: replace Abstract: In order to follow the ever-growing computational complexity and data intensity of state-of-the-art AI models, new computing paradigms are being proposed. These paradigms aim at achieving high energy efficiency by mitigating the Von Neumann bottleneck that relates to the energy cost of moving data between the processing cores and the memory. Convolutional Neural Networks (CNNs) are susceptible to this bottleneck, given the massive data they have to manage. Systolic arrays (SAs) are promising architectures to mitigate data transmission cost, thanks to high data utilization of Processing Elements (PEs). These PEs continuously exchange and process data locally based on specific dataflows (such as weight stationary and row stationary), in turn reducing the number of memory accesses to the main memory. In SAs, convolutions are managed either as matrix multiplications or exploiting the raster-order scan of sliding windows. However, data redundancy is a primary concern affecting area, power, and energy. In this paper, we propose TrIM: a novel dataflow for SAs based on a Triangular Input Movement and compatible with CNN computing. TrIM maximizes the local input utilization, minimizes the weight data movement, and solves the data redundancy problem. Furthermore, TrIM does not incur the significant on-chip memory penalty introduced by the row stationary dataflow. When compared to state-of-the-art SA dataflows, the high data utilization offered by TrIM guarantees ~10X less memory access. Furthermore, considering that PEs continuously overlap multiplications and accumulations, TrIM achieves high throughput (up to 81.8% higher than row stationary), other than requiring a limited number of registers (up to 15.6X fewer registers than row stationary).

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Systolic Arrays 数据流 卷积神经网络 效率提升 内存访问
相关文章