MarkTechPost@AI 前天 13:55
Tiny Models, Big Reasoning Gains: USC Researchers Introduce Tina for Cost-Effective Reinforcement Learning with LoRA
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

南加州大学的研究人员推出Tina系列推理模型,通过在15亿参数的基模型上应用LoRA进行强化学习,实现了强大的推理能力。Tina在AIME24上取得了43.33%的Pass@1成绩,推理性能提升超过20%,而训练成本仅为9美元。该研究强调了模型的小型化、参数更新的最小化以及低硬件和预算的需求,所有资源均已开源,为推理研究提供了低成本、高效率的解决方案。

💡Tina模型的核心在于使用LoRA(低秩自适应)技术,在15亿参数的DeepSeek-R1-Distill-Qwen-1.5B模型上进行强化学习。这种方法允许只更新模型参数的一小部分,从而在保持推理能力的同时,显著降低了计算成本。

🚀Tina模型在多个基准测试中表现出色,尤其是在AIME24上,取得了43.33%的Pass@1成绩,推理性能提升超过20%。这些结果表明,即使是小型模型,通过LoRA和强化学习的结合,也能在推理任务上取得显著的进展。

💰Tina模型的训练成本极低,每次实验平均花费不到100美元,后训练成本仅为9美元。这种低成本的特性使得Tina成为一个高度可访问的研究平台,有助于推动推理领域的研究和发展。

🛠️研究人员通过使用公开数据集和OpenR1代码库,并进行最少的超参数调整,实现了Tina模型的训练。他们还对基线推理模型进行了重新评估,确保了比较的公平性,并利用LightEval框架和vLLM引擎消除了先前研究中引入的差异。

Achieving strong, multi-step reasoning in LMs remains a major challenge, despite notable progress in general task performance. Such reasoning is crucial for complex problem-solving domains, such as scientific research and strategic planning. Traditionally, enhancing reasoning skills involves supervised fine-tuning (SFT), where models learn by imitating step-by-step reasoning demonstrations from more advanced models, such as o1. While effective, this method heavily depends on the availability of high-quality reasoning traces, which are costly and risk promoting shallow mimicry over genuine logical exploration. RL offers an alternative by enabling models to learn directly from reward signals, encouraging broader reasoning exploration. However, RL approaches are often resource-heavy and complex, raising the question of how to build reasoning-capable models cost-effectively.

Following the release of strong models like o1-preview, several open-source efforts such as STILL, Sky-T1, SimpleRL, PRIME, and DeepScaleR have explored efficient strategies to replicate or surpass o1’s reasoning capabilities. Techniques include lightweight imitation learning, scalable instruction tuning, and simplified RL methods. Meanwhile, newer innovations, such as Group Relative Policy Optimization (GRPO), enhance RL training efficiency by eliminating the need for separate value networks, as seen in models like DeepSeek-R1. To further lower training costs, researchers are also investigating Low-Rank Adaptation (LoRA) methods, which update only a small subset of model parameters, maintaining modularity while preserving reasoning ability. This approach enables efficient fine-tuning without the computational demands of full-parameter updates.

Researchers from the University of Southern California introduce Tina, a family of compact reasoning models that achieve strong performance with minimal cost. Using RL enhanced by LoRA on a 1.5B parameter base model, Tina models outperform or match state-of-the-art models at a fraction of the computational expense. Their best model improves reasoning performance by over 20% and achieves 43.33% Pass@1 on AIME24, with a post-training cost of just $9. By leveraging LoRA’s efficiency to adapt reasoning formats while preserving base knowledge, Tina highlights a highly accessible, cost-effective approach, with all resources fully open-sourced.

Tina is a family of tiny reasoning models built by post-training the DeepSeek-R1-Distill-Qwen-1.5B model using LoRA during reinforcement learning with a GRPO-style approach. The framework emphasizes minimalism—tiny models, small parameter updates, and a low hardware and budget footprint. Tina models were trained using public datasets and replicated setups from models like STILL-3, DeepScaleR, and Open-RS. Training leveraged the OpenR1 codebase, minimal hyperparameter tuning, and just two NVIDIA L40S GPUs, occasionally RTX 6000 Ada GPUs. Training and evaluation costs were low, averaging well under a $100 budget per experiment, making Tina a highly accessible platform for reasoning research.

To ensure fair comparisons, the authors reevaluated baseline reasoning models using a consistent setup with the LightEval framework and vLLM engine, thereby eliminating variations introduced by previous studies. Six reasoning benchmarks, including AIME 24/25, AMC 23, MATH 500, GPQA, and Minerva, were utilized. They then evaluated Tina models—small, LoRA-trained versions of baseline models—showing that Tina models often outperformed their full-parameter counterparts despite using minimal training (19–57% of an epoch). Further ablation studies revealed that smaller, high-quality datasets, appropriate learning rates, moderate LoRA ranks, and careful choice of RL algorithm significantly impacted performance, confirming the efficiency and robustness of their LoRA-based reasoning approach.

In conclusion, Tina, a series of lightweight reasoning models that achieve strong performance using minimal computational resources. By applying LoRA during RL on a 1.5 B-parameter base model, they achieve reasoning abilities competitive with larger state-of-the-art models at a post-training cost of just $9. Tina models show over a 20% improvement in reasoning and 43.33% Pass@1 accuracy on AIME24. While showcasing impressive cost-performance efficiency, limitations remain, including the smaller model scale, limited diversity in reasoning tasks, and minimal hyperparameter tuning. All code, logs, and model checkpoints are open-sourced to promote accessible research and further exploration.


Check out the Paper and GitHub Page. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

[Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

The post Tiny Models, Big Reasoning Gains: USC Researchers Introduce Tina for Cost-Effective Reinforcement Learning with LoRA appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Tina模型 LoRA 强化学习 推理 人工智能
相关文章