MIT News - Machine learning 02月03日
User-friendly system can help developers build more efficient simulations and AI models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

麻省理工学院的研究人员开发了一种自动化系统,名为SySTeC,旨在提高深度学习算法的效率。该系统能够同时利用数据中的两种冗余类型:稀疏性和对称性。通过这种方式,SySTeC减少了机器学习操作所需的计算量、带宽和内存存储。与现有技术相比,该方法在某些实验中将计算速度提高了近30倍。该系统使用户友好的编程语言,适用于各种应用,并能帮助非深度学习专家优化其数据处理的AI算法。SySTeC通过识别和利用对称性进行优化,并进一步优化稀疏性,最终生成可直接使用的代码。

🚀 SySTeC系统能够同时利用深度学习数据结构中的两种冗余:稀疏性和对称性,从而显著减少计算量、带宽和内存需求。

⚙️ 该系统通过识别三种关键对称性优化方法来工作:如果输出张量是对称的,则只需计算一半;如果输入张量是对称的,则只需读取一半;以及跳过中间计算结果的冗余计算。

⏱️ SySTeC首先自动优化代码以利用对称性,然后进行额外的转换以仅存储非零数据值,从而进一步优化稀疏性,最终生成可直接使用的代码,在某些实验中实现了近30倍的加速。

👨‍💻 该系统使用户友好的编程语言,适用于各种应用,并且可以帮助非深度学习专家提高他们用于处理数据的AI算法的效率,还可能在科学计算中得到应用。

The neural network artificial intelligence models used in applications like medical image processing and speech recognition perform operations on hugely complex data structures that require an enormous amount of computation to process. This is one reason deep-learning models consume so much energy.

To improve the efficiency of AI models, MIT researchers created an automated system that enables developers of deep learning algorithms to simultaneously take advantage of two types of data redundancy. This reduces the amount of computation, bandwidth, and memory storage needed for machine learning operations.

Existing techniques for optimizing algorithms can be cumbersome and typically only allow developers to capitalize on either sparsity or symmetry — two different types of redundancy that exist in deep learning data structures.

By enabling a developer to build an algorithm from scratch that takes advantage of both redundancies at once, the MIT researchers’ approach boosted the speed of computations by nearly 30 times in some experiments.

Because the system utilizes a user-friendly programming language, it could optimize machine-learning algorithms for a wide range of applications. The system could also help scientists who are not experts in deep learning but want to improve the efficiency of AI algorithms they use to process data. In addition, the system could have applications in scientific computing.

“For a long time, capturing these data redundancies has required a lot of implementation effort. Instead, a scientist can tell our system what they would like to compute in a more abstract way, without telling the system exactly how to compute it,” says Willow Ahrens, an MIT postdoc and co-author of a paper on the system, which will be presented at the International Symposium on Code Generation and Optimization.

She is joined on the paper by lead author Radha Patel ’23, SM ’24 and senior author Saman Amarasinghe, a professor in the Department of Electrical Engineering and Computer Science (EECS) and a principal researcher in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

Cutting out computation

In machine learning, data are often represented and manipulated as multidimensional arrays known as tensors. A tensor is like a matrix, which is a rectangular array of values arranged on two axes, rows and columns. But unlike a two-dimensional matrix, a tensor can have many dimensions, or axes, making tensors more difficult to manipulate.

Deep-learning models perform operations on tensors using repeated matrix multiplication and addition — this process is how neural networks learn complex patterns in data. The sheer volume of calculations that must be performed on these multidimensional data structures requires an enormous amount of computation and energy.

But because of the way data in tensors are arranged, engineers can often boost the speed of a neural network by cutting out redundant computations.

For instance, if a tensor represents user review data from an e-commerce site, since not every user reviewed every product, most values in that tensor are likely zero. This type of data redundancy is called sparsity. A model can save time and computation by only storing and operating on non-zero values.

In addition, sometimes a tensor is symmetric, which means the top half and bottom half of the data structure are equal. In this case, the model only needs to operate on one half, reducing the amount of computation. This type of data redundancy is called symmetry.

“But when you try to capture both of these optimizations, the situation becomes quite complex,” Ahrens says.

To simplify the process, she and her collaborators built a new compiler, which is a computer program that translates complex code into a simpler language that can be processed by a machine. Their compiler, called SySTeC, can optimize computations by automatically taking advantage of both sparsity and symmetry in tensors.

They began the process of building SySTeC by identifying three key optimizations they can perform using symmetry.

First, if the algorithm’s output tensor is symmetric, then it only needs to compute one half of it. Second, if the input tensor is symmetric, then algorithm only needs to read one half of it. Finally, if intermediate results of tensor operations are symmetric, the algorithm can skip redundant computations.

Simultaneous optimizations

To use SySTeC, a developer inputs their program and the system automatically optimizes their code for all three types of symmetry. Then the second phase of SySTeC performs additional transformations to only store non-zero data values, optimizing the program for sparsity.

In the end, SySTeC generates ready-to-use code.

“In this way, we get the benefits of both optimizations. And the interesting thing about symmetry is, as your tensor has more dimensions, you can get even more savings on computation,” Ahrens says.

The researchers demonstrated speedups of nearly a factor of 30 with code generated automatically by SySTeC.

Because the system is automated, it could be especially useful in situations where a scientist wants to process data using an algorithm they are writing from scratch.

In the future, the researchers want to integrate SySTeC into existing sparse tensor compiler systems to create a seamless interface for users. In addition, they would like to use it to optimize code for more complicated programs.

This work is funded, in part, by Intel, the National Science Foundation, the Defense Advanced Research Projects Agency, and the Department of Energy.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

SySTeC AI模型优化 深度学习 数据冗余 计算效率
相关文章