User-friendly system can help developers build more efficient simulations and AI models

The neural network artificial intelligence models used in applications like medical image processing and speech recognition perform operations on hugely complex data structures that require an enormous amount of computation to process. This is one reason deep-learning models consume so much energy.

To improve the efficiency of AI models, MIT researchers created an automated system that enables developers of deep learning algorithms to simultaneously take advantage of two types of data redundancy. This reduces the amount of computation, bandwidth, and memory storage needed for machine learning operations.

Existing techniques for optimizing algorithms can be cumbersome and typically only allow developers to capitalize on either sparsity or symmetry — two different types of redundancy that exist in deep learning data structures.

By enabling a developer to build an algorithm from scratch that takes advantage of both redundancies at once, the MIT researchers’ approach boosted the speed of computations by nearly 30 times in some experiments.

Because the system utilizes a user-friendly programming language, it could optimize machine-learning algorithms for a wide range of applications. The system could also help scientists who are not experts in deep learning but want to improve the efficiency of AI algorithms they use to process data. In addition, the system could have applications in scientific computing.

“For a long time, capturing these data redundancies has required a lot of implementation effort. Instead, a scientist can tell our system what they would like to compute in a more abstract way, without telling the system exactly how to compute it,” says Willow Ahrens, an MIT postdoc and co-author of a paper on the system, which will be presented at the International Symposium on Code Generation and Optimization.

She is joined on the paper by lead author Radha Patel ’23, SM ’24 and senior author Saman Amarasinghe, a professor in the Department of Electrical Engineering and Computer Science (EECS) and a principal researcher in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

Cutting out computation

In machine learning, data are often represented and manipulated as multidimensional arrays known as tensors. A tensor is like a matrix, which is a rectangular array of values arranged on two axes, rows and columns. But unlike a two-dimensional matrix, a tensor can have many dimensions, or axes, making tensors more difficult to manipulate.

Deep-learning models perform operations on tensors using repeated matrix multiplication and addition — this process is how neural networks learn complex patterns in data. The sheer volume of calculations that must be performed on these multidimensional data structures requires an enormous amount of computation and energy.

But because of the way data in tensors are arranged, engineers can often boost the speed of a neural network by cutting out redundant computations.

For instance, if a tensor represents user review data from an e-commerce site, since not every user reviewed every product, most values in that tensor are likely zero. This type of data redundancy is called sparsity. A model can save time and computation by only storing and operating on non-zero values.

In addition, sometimes a tensor is symmetric, which means the top half and bottom half of the data structure are equal. In this case, the model only needs to operate on one half, reducing the amount of computation. This type of data redundancy is called symmetry.

“But when you try to capture both of these optimizations, the situation becomes quite complex,” Ahrens says.

To simplify the process, she and her collaborators built a new compiler, which is a computer program that translates complex code into a simpler language that can be processed by a machine. Their compiler, called SySTeC, can optimize computations by automatically taking advantage of both sparsity and symmetry in tensors.

They began the process of building SySTeC by identifying three key optimizations they can perform using symmetry.

First, if the algorithm’s output tensor is symmetric, then it only needs to compute one half of it. Second, if the input tensor is symmetric, then algorithm only needs to read one half of it. Finally, if intermediate results of tensor operations are symmetric, the algorithm can skip redundant computations.

Simultaneous optimizations

To use SySTeC, a developer inputs their program and the system automatically optimizes their code for all three types of symmetry. Then the second phase of SySTeC performs additional transformations to only store non-zero data values, optimizing the program for sparsity.

In the end, SySTeC generates ready-to-use code.

“In this way, we get the benefits of both optimizations. And the interesting thing about symmetry is, as your tensor has more dimensions, you can get even more savings on computation,” Ahrens says.

The researchers demonstrated speedups of nearly a factor of 30 with code generated automatically by SySTeC.

Because the system is automated, it could be especially useful in situations where a scientist wants to process data using an algorithm they are writing from scratch.

In the future, the researchers want to integrate SySTeC into existing sparse tensor compiler systems to create a seamless interface for users. In addition, they would like to use it to optimize code for more complicated programs.

This work is funded, in part, by Intel, the National Science Foundation, the Defense Advanced Research Projects Agency, and the Department of Energy.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签