MarkTechPost@AI 11小时前
Mistral AI Releases Mistral Small 3.2: Enhanced Instruction Following, Reduced Repetition, and Stronger Function Calling for AI Integration
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Mistral AI发布了Mistral Small 3.2,这是其前代产品的升级版,旨在优化AI模型的精准度和稳定性。新模型在指令执行、重复错误减少和函数调用方面有所改进,尤其在处理复杂指令和避免冗余输出方面表现出色。通过Wildbench v2和Arena Hard v2等测试,证明了其在理解和执行复杂指令方面的能力提升。此外,新模型在STEM相关任务中也展现出更优异的性能,使其成为复杂AI任务的可靠选择。

🚀 提升指令遵循精度:Mistral Small 3.2在Wildbench v2指令测试中,准确率从55.6%提升至65.33%,显著提高了对复杂指令的理解和执行能力,能更精确地响应用户命令。

🚫 减少重复错误:新模型有效减少了无限或重复输出的发生,无限生成错误降低了一半,从Small 3.1的2.11%降至1.29%。这直接提升了模型在长时间交互中的可用性和可靠性。

⚙️ 增强函数调用能力:Mistral Small 3.2在函数调用模板方面表现出更强的鲁棒性,确保了更稳定的集成。这使其更适合自动化任务,简化了复杂操作。

🔬 STEM领域性能提升:在HumanEval Plus Pass@5代码测试中,准确率从88.99%提升至92.90%,MMLU Pro测试结果也从66.76%提升至69.06%,表明模型在科学和技术应用中的整体能力有所提高。

With the frequent release of new large language models (LLMs), there is a persistent quest to minimize repetitive errors, enhance robustness, and significantly improve user interactions. As AI models become integral to more sophisticated computational tasks, developers are consistently refining their capabilities, ensuring seamless integration within diverse, real-world scenarios.

Mistral AI has released Mistral Small 3.2 (Mistral-Small-3.2-24B-Instruct-2506), an updated version of its earlier release, Mistral-Small-3.1-24B-Instruct-2503. Although a minor release, Mistral Small 3.2 introduces fundamental upgrades that aim to enhance the model’s overall reliability and efficiency, particularly in handling complex instructions, avoiding redundant outputs, and maintaining stability under function-calling scenarios.

A significant enhancement in Mistral Small 3.2 is its accuracy in executing precise instructions. Successful user interaction often requires precision in executing subtle commands. Benchmark scores accurately reflect this improvement: under the Wildbench v2 instruction test, Mistral Small 3.2 achieved 65.33% accuracy, an improvement from 55.6% for its predecessor. Conversely, performance in the difficult Arena Hard v2 test was almost doubled, from 19.56% to 43.1%, which provides evidence of its improved ability to execute and grasp intricate commands precisely.

Correcting repetition errors, Mistral Small 3.2 greatly minimizes instances of infinite or repetitive output, a problem commonly faced in long conversational scenarios. Internal evaluations show that Small 3.2 effectively cuts instances of infinite generation errors by half, from 2.11% in Small 3.1 to 1.29%. This complete reduction directly increases the model’s usability and dependability in extended interactions. The new model also demonstrates greater capability to call functions, making it ideal for automation tasks. Also, improved robustness in the function calling template translates to more stable and dependable interactions.

STEM-related benchmark improvement further demonstrates Small 3.2’s aptitude. For example, the HumanEval Plus Pass@5 code test had its accuracy increase from 88.99% in Small 3.1 to a whopping 92.90%. Also, MMLU Pro test results increased from 66.76% to 69.06%, and GPQA Diamond ratings improved slightly from 45.96% to 46.13%, showing general competence in scientific and technical uses.

Vision-based performance outcomes were inconsistent, with certain optimizations being selectively applied. ChartQA accuracy improved from 86.24% to 87.4%, and DocVQA marginally enhanced from 94.08% to 94.86%. In contrast, some tests, such as MMMU and Mathvista, experienced slight dips, indicating specific trade-offs encountered during the optimization process.

The key updates in Mistral Small 3.2 over Small 3.1 include:

In conclusion, Mistral Small 3.2 offers targeted and practical enhancements over its predecessor, providing users with greater accuracy, reduced redundancy, and improved integration capabilities. These advancements help position it as a reliable choice for complex AI-driven tasks across diverse application areas.


Check out the Model Card on Hugging Face. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Mistral AI Releases Mistral Small 3.2: Enhanced Instruction Following, Reduced Repetition, and Stronger Function Calling for AI Integration appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Mistral AI LLM 模型升级 AI性能
相关文章