TechCrunch News 04月12日 06:52
Meta’s vanilla Maverick AI model ranks below rivals on a popular chat benchmark
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Meta使用未发布的Llama 4 Maverick模型在LM Arena获高分,引发该平台维护者采取措施。未修改的Maverick排名不佳,Meta解释其实验模型优化了对话性,但此举有误导性且使模型在不同情境中的表现难以预测。Meta表示已发布开源版本并期待开发者反馈。

🦘Meta使用未发布Llama 4 Maverick模型在LM Arena获高分

🙅‍未修改的Maverick排名低于其他多个月前的模型

📃Meta解释实验模型优化了对话性,但此举有问题

🎤Meta已发布开源版本并期待开发者反馈

Earlier this week, Meta landed in hot water for using an experimental, unreleased version of its Llama 4 Maverick model to achieve a high score on a crowdsourced benchmark, LM Arena. The incident prompted the maintainers of LM Arena to apologize, change their policies, and score the unmodified, vanilla Maverick.

Turns out, it’s not very competitive.

The unmodified Maverick, “Llama-4-Maverick-17B-128E-Instruct,” was ranked below models including OpenAI’s GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 1.5 Pro as of Friday. Many of these models are months old.

Why the poor performance? Meta’s experimental Maverick, Llama-4-Maverick-03-26-Experimental, was “optimized for conversationality,” the company explained in a chart published last Saturday. Those optimizations evidently played well to LM Arena, which has human raters compare the outputs of models and choose which they prefer.

As we’ve written about before, for various reasons, LM Arena has never been the most reliable measure of an AI model’s performance. Still, tailoring a model to a benchmark — besides being misleading — makes it challenging for developers to predict exactly how well the model will perform in different contexts.

In a statement, a Meta spokesperson told TechCrunch that Meta experiments with “all types of custom variants.”

“‘Llama-4-Maverick-03-26-Experimental’ is a chat optimized version we experimented with that also performs well on LMArena,” the spokesperson said. “We have now released our open source version and will see how developers customize Llama 4 for their own use cases. We’re excited to see what they will build and look forward to their ongoing feedback.”

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Meta Llama 4 Maverick LM Arena AI模型性能
相关文章