MarkTechPost@AI 01月17日
Sakana AI Introduces Transformer²: A Machine Learning System that Dynamically Adjusts Its Weights for Various Tasks
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Sakana AI团队推出了Transformer²,一种新型自适应机器学习框架,通过奇异值微调(SVF)方法,使大型语言模型(LLM)能够实时适应新任务,无需大量重新训练。该方法通过选择性地修改模型权重矩阵的奇异分量,实现动态的任务特定调整,从而显著降低了微调的计算负担。Transformer²在多个基准测试中表现出色,特别是在视觉问答和数学问题解决方面。其SVF方法不仅提高了训练效率,还减少了所需的计算资源,仅需LoRA方法不到10%的参数量即可实现更高的性能。此外,该模型还展现出良好的组合性,使得为特定任务训练的向量可以重复使用并与其他任务的向量组合,从而提高了模型的通用性和可扩展性。

💡Transformer² 引入奇异值微调(SVF)方法,通过调整权重矩阵的奇异值,实现LLM的实时自适应,无需大量重新训练。

🚀 SVF方法通过强化学习创建专门针对特定任务的紧凑“专家”向量,并采用双通道机制进行推理,显著减少了可训练参数的数量,提高了效率。

🏆 Transformer² 在多个基准测试中表现出色,例如在视觉问答中性能提升超过39%,在GSM8K数学问题解决中,性能提升约4%,同时在编程任务中也展示了显著的准确性提升。

⚙️ SVF方法所需参数比LoRA少10%以上,训练时间更短,所需计算资源更少,同时还展现出良好的组合性,支持向量的复用和组合。

LLMs are essential in industries such as education, healthcare, and customer service, where natural language understanding plays a crucial role. Though highly versatile, LLMs’ challenge is adapting to new tasks. Most fine-tuning methods are resource and time-consuming. Moreover, the fine-tuning approach often results in overfitting or sacrificing general adaptability for task-specific performance. This is a barrier for LLMs to address dynamic new and unforeseen tasks and creates a bottleneck in the overall application.

One of the most prominent methods to address these challenges is Low-Rank Adaptation (LoRA), which updates small, task-specific matrices while freezing the rest of the model’s parameters. Although this reduces the computational cost of fine-tuning, it has limitations, such as increased sensitivity to overfitting and the inability to scale efficiently across tasks. Moreover, LoRA’s design lacks inherent compositionality, limiting its ability to integrate multiple domain-specific skills.

The researchers at Sakana AI and Institute of Science Tokyo introduced Transformer², a novel self-adaptive machine learning framework for large language models. Transformer² employs a groundbreaking method called Singular Value Fine-tuning (SVF), which adapts LLMs in real time to new tasks without extensive retraining. By focusing on selectively modifying the singular components of the model’s weight matrices, Transformer² enables dynamic task-specific adjustments. This innovation reduces the computational burden associated with fine-tuning, offering a scalable and efficient solution for self-adaptation.

At the heart of Transformer² is the SVF method, which fine-tunes the singular values of weight matrices. This approach drastically minimizes the number of trainable parameters compared to traditional methods. Instead of altering the entire model, SVF leverages reinforcement learning to create compact “expert” vectors specialized for specific tasks. For the inference process, Transformer² works on a two-pass mechanism: the first is to analyze what the task might be and requires, and in the second, it dynamically integrates various relevant expert vectors to produce suitable behavior. Modularly, the approach ensures efficiency in addressing such a wide array of tasks through Transformer².

Transformer² performed outstanding performance in extensive benchmark evaluations. For instance, the framework shows improvements of over 39% compared to baselines in visual question-answering domains. In mathematics-related problem-solving, when testing was done on the GSM8K datasets, this model showed its strength by winning more than any fine-tuning method, reaching about a 4% improvement in its performance. On programming tasks under the MBPP-pro benchmark, Transformer² displayed considerable accuracy improvements for domain-specific tasks and its general performance on various types of domains. As a result, Transformer² adapted efficiently to unseen tasks like ARC-Challenge and Humaneval by either maintaining or exceeding the baseline performance metrics.

An important overall outcome was the SVF method’s efficiency. This improved training times and reduced the need for significant computational requirements as this method used fewer than 10% of the parameters required by LoRA. For example, for the GSM8K dataset, only 0.39 million parameters were needed for SVF training versus 6.82 million using LoRA to achieve higher performance. In addition, the model demonstrated good compositionality; vectors trained as an expert for one task could be reused and added together with others for a different, unrelated task, indicating the ability to scale up this Transformer² framework.

The researchers achieved this leap forward by addressing core limitations in existing methods, such as overfitting and inefficiency. By leveraging reinforcement learning, the SVF method provided principled regularization, preventing performance collapse on small datasets or narrow task domains. This allowed Transformer² to excel despite limited training data while maintaining task adaptability.

Conclusion: A research team from Sakana AI provided a scalable and efficient solution to task-specific adaptation in LLMs. Transformer², with its SVF method, is a highly significant advancement within the field that will pave the way for computationally efficient self-adaptive AI systems that are highly versatile. This approach will answer present challenges and lay a foundation for future developments of adaptive AI technologies.


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

Recommend Open-Source Platform: Parlant is a framework that transforms how AI agents make decisions in customer-facing scenarios. (Promoted)

The post Sakana AI Introduces Transformer²: A Machine Learning System that Dynamically Adjusts Its Weights for Various Tasks appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Transformer² 奇异值微调 自适应学习 大型语言模型 Sakana AI
相关文章