TechCrunch News 03月20日
Pruna AI open sources its AI model optimization framework
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Pruna AI是一家欧洲初创公司,专注于AI模型压缩算法。该公司宣布将其优化框架开源,该框架整合了缓存、剪枝、量化和蒸馏等多种效率方法,能够标准化压缩模型的保存和加载,并评估压缩后的模型质量和性能提升。Pruna AI的目标是像Hugging Face标准化Transformer和Diffuser一样,为AI效率方法提供标准化的工具。公司目前主要关注图像和视频生成模型,并提供企业版服务,包括优化代理和即将推出的压缩代理,旨在帮助用户在不损失准确性的前提下提高模型速度。Pruna AI已获得650万美元的种子轮融资。

💡Pruna AI开源其AI模型压缩框架,该框架整合了缓存、剪枝、量化和蒸馏等多种效率方法,用于优化AI模型。

⚙️Pruna AI框架能够标准化压缩模型的保存和加载,并评估压缩后的模型质量和性能增益,类似于Hugging Face在Transformer和Diffuser方面的标准化作用。

🖼️Pruna AI目前主要关注图像和视频生成模型,并提供企业版服务,包括优化代理和即将推出的压缩代理,用户可以通过指定速度和精度要求,让代理自动找到最佳的压缩方案。

💰Pruna AI的收费模式类似于云服务中的GPU租赁,通过优化模型,帮助用户节省推理成本,例如,Pruna AI已成功将一个Llama模型缩小8倍,且没有太大损失。

Pruna AI, a European startup that has been working on compression algorithms for AI models, is making its optimization framework open source on Thursday.

Pruna AI has been creating a framework that applies several efficiency methods, such as caching, pruning, quantization and distillation, to a given AI model.

“We also standardize saving and loading the compressed models, applying combinations of these compression methods, and also evaluating your compressed model after you compress it,” Pruna AI co-fonder and CTO John Rachwan told TechCrunch.

In particular, Pruna AI’s framework can evaluate if there’s significant quality loss after compressing a model and the performance gains that you get.

“If I were to use a metaphor, we are similar to how Hugging Face standardized transformers and diffusers — how to call them, how to save them, load them, etc. We are doing the same, but for efficiency methods,” he added.

Big AI labs have already been using various compression methods already. For instance, OpenAI has been relying on distillation to create faster versions of its flagship models.

This is likely how OpenAI developed GPT-4 Turbo, a faster version of GPT-4. Similarly, the Flux.1-schnell image generation model is a distilled version of the Flux.1 model from Black Forest Labs.

Distillation is a technique used to extract knowledge from a large AI model with a “teacher-student” model. Developers send requests to a teacher model and record the outputs. Answers are sometimes compared with a dataset to see how accurate they are. These outputs are then used to train the student model, which is trained to approximate the teacher’s behavior.

“For big companies, what they usually do is that they build this stuff in-house. And what you can find in the open source world is usually based on single methods. For example, let’s say one quantization method for LLMs, or one caching method for diffusion models,” Rachwan said. “But you cannot find a tool that aggregates all of them, makes them all easy to use and combine together. And this is the big value that Pruna is bringing right now.”

Left to right: Rayan Nait Mazi, Bertrand Charpentier, John Rachwan, Stephan GünnemannImage Credits:Pruna AI

While Pruna AI supports any kind of models, from large language models to diffusion models, speech-to-text models and computer vision models, the company is focusing more specifically on image and video generation models right now.

Some of Pruna AI’s existing users include Scenario and PhotoRoom. In addition to the open source edition, Pruna AI has an enterprise offering with advanced optimization features including an optimization agent.

“The most exciting feature that we are releasing soon will be a compression agent,” Rachwan said. “Basically, you give it your model, you say: ‘I want more speed but don’t drop my accuracy by more than 2%.’ And then, the agent will just do its magic. It will find the best combination for you, return it for you. You don’t have to do anything as a developer.”

Pruna AI charges by the hour for its pro version. “It’s similar to how you would think of a GPU when you rent a GPU on AWS or any cloud service,” Rachwan said.

And if your model is a critical part of your AI infrastructure, you’ll end up saving a lot of money on inference with the optimized model. For example, Pruna AI has made a Llama model eight times smaller without too much loss using its compression framework. Pruna AI hopes its customers will think about its compression framework as an investment that pays for itself.

Pruna AI raised a $6.5 million seed funding round a few months ago. Investors in the startup include EQT Ventures, Daphni, Motier Ventures and Kima Ventures.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Pruna AI AI模型压缩 开源 AI效率
相关文章