MarkTechPost@AI 05月09日 12:20
OpenAI Releases Reinforcement Fine-Tuning (RFT) on o4-mini: A Step Forward in Custom Model Optimization
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

OpenAI在其o4-mini推理模型上推出了强化微调(RFT)技术,为定制基础模型以适应特定任务提供了一种强大的新方法。RFT基于强化学习原理,允许组织定义自定义目标和奖励函数,从而实现对模型改进方式的精细控制,远超标准监督微调。通过RFT,开发者可以更好地将模型推向实际应用的理想行为,不仅教会模型输出什么,还教会模型为什么在特定领域中这种输出是首选的。早期采用者已展示了RFT在法律推理、医学理解、代码合成和策略执行等领域的应用潜力,显著提高了模型在这些领域的准确性和效率。

🚀 OpenAI在o4-mini模型上推出强化微调(RFT),它利用强化学习原理,允许开发者通过定义自定义目标和奖励函数,对模型进行更细致的调整。

💡 RFT通过提供任务特定的评分器来工作,该评分器评估模型输出并基于自定义标准进行评分。模型经过训练,可以针对此奖励信号进行优化,逐渐学习生成符合所需行为的响应。

🎯 早期应用案例表明,RFT能够显著提高模型在特定领域的性能,例如,Accordance AI使用RFT将税务分析模型的准确率提高了39%,Ambience Healthcare将医疗编码准确率提高了12个百分点。

⚙️ 使用RFT的关键步骤包括:设计评分函数、准备数据集、启动训练作业以及评估和迭代。OpenAI提供了全面的文档和示例,帮助用户入门。

OpenAI has launched Reinforcement Fine-Tuning (RFT) on its o4-mini reasoning model, introducing a powerful new technique for tailoring foundation models to specialized tasks. Built on principles of reinforcement learning, RFT allows organizations to define custom objectives and reward functions, enabling fine-grained control over how models improve—far beyond what standard supervised fine-tuning offers.

At its core, RFT is designed to help developers push models closer to ideal behavior for real-world applications by teaching them not just what to output, but why that output is preferred in a particular domain.

What is Reinforcement Fine-Tuning?

Reinforcement Fine-Tuning applies reinforcement learning principles to language model fine-tuning. Rather than relying solely on labeled examples, developers provide a task-specific grader—a function that evaluates and scores model outputs based on custom criteria. The model is then trained to optimize against this reward signal, gradually learning to generate responses that align with the desired behavior.

This approach is particularly valuable for nuanced or subjective tasks where ground truth is difficult to define. For instance, you might not have labeled data for “the best way to phrase a medical explanation,” but you can write a program that assesses clarity, correctness, and completeness—and let the model learn accordingly.

Why o4-mini?

OpenAI’s o4-mini is a compact reasoning model released in April 2025, optimized for both text and image inputs. It’s part of OpenAI’s new generation of multitask-capable models and is particularly strong at structured reasoning and chain-of-thought prompts.

By enabling RFT on o4-mini, OpenAI gives developers access to a lightweight yet capable foundation that can be precisely tuned for high-stakes, domain-specific reasoning tasks—while remaining computationally efficient and fast enough for real-time applications.

Applied Use Cases: What Developers Are Building with RFT

Several early adopters have demonstrated the practical potential of RFT on o4-mini:

These examples underscore RFT’s strength in aligning models with use-case-specific requirements—whether those involve legal reasoning, medical understanding, code synthesis, or policy enforcement.

How to Use RFT on o4-mini

Getting started with Reinforcement Fine-Tuning involves four key components:

    Design a Grading Function: Developers define a Python function that evaluates model outputs. This function returns a score from 0 to 1 and can encode task-specific preferences, such as correctness, format, or tone.Prepare a Dataset: A high-quality prompt dataset is essential. OpenAI recommends using diverse and challenging examples that reflect the target task.Launch a Training Job: Via OpenAI’s fine-tuning API or dashboard, users can launch RFT runs with adjustable configurations and performance tracking.Evaluate and Iterate: Developers monitor reward progression, evaluate checkpoints, and refine grading logic to maximize performance over time.

Comprehensive documentation and examples are available through OpenAI’s RFT guide.

Access and Pricing

RFT is currently available to verified organizations. Training costs are billed at $100/hour for active training time. If a hosted OpenAI model is used to run the grader (e.g., GPT-4o), token usage for those calls is charged separately at standard inference rates.

As an incentive, OpenAI is offering a 50% training cost discount for organizations that agree to share their datasets for research and model improvement purposes.

A Technical Leap for Model Customization

Reinforcement Fine-Tuning represents a shift in how we adapt foundation models to specific needs. Rather than merely replicating labeled outputs, RFT enables models to internalize feedback loops that reflect the goals and constraints of real-world applications. For organizations working on complex workflows where precision and alignment matter, this new capability opens a critical path to reliable and efficient AI deployment.

With RFT now available on the o4-mini reasoning model, OpenAI is equipping developers with tools not just to fine-tune language—but to fine-tune reasoning itself.


Check out the Detailed Documentation here. Also, don’t forget to follow us on Twitter.

Here’s a brief overview of what we’re building at Marktechpost:

The post OpenAI Releases Reinforcement Fine-Tuning (RFT) on o4-mini: A Step Forward in Custom Model Optimization appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

OpenAI 强化微调 RFT o4-mini 模型定制
相关文章