TechCrunch News 01月30日
Ai2 says its new AI model beats one of DeepSeek’s best
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

西雅图的非营利AI研究机构Ai2发布了Tulu3-405B模型,声称其性能超越了中国DeepSeek公司的DeepSeek V3,并在某些AI基准测试中击败了OpenAI的GPT-4o。更重要的是,Tulu3-405B是开源的,所有组件都可免费获取和许可。Ai2表示,此举彰显了美国在全球生成式AI模型开发中的领导潜力。Tulu3-405B是一个拥有4050亿参数的大型模型,使用256个GPU并行训练。它在数学和常识测试等基准上表现出色,尤其在PopQA和GSM8K测试中超越了DeepSeek V3、GPT-4o和Meta的Llama 3模型。该模型可通过Ai2的聊天机器人Web应用程序进行测试,训练和微调代码已在GitHub上发布。

🚀Ai2发布Tulu3-405B模型,性能超越DeepSeek V3和GPT-4o,标志着美国在开源AI领域的领导地位。

💡Tulu3-405B模型拥有4050亿参数,采用256个GPU并行训练,展现了其强大的计算能力和问题解决能力。

🏆在PopQA和GSM8K基准测试中,Tulu3-405B不仅击败了DeepSeek V3和GPT-4o,还超越了Meta的Llama 3,证明了其在特定任务上的卓越表现。

📚Tulu3-405B采用强化学习与可验证奖励技术(RLVR)进行训练,通过数学问题解决和指令遵循等可验证结果的任务来提升模型性能。

💻Tulu3-405B模型及其训练代码均已开源,可在Ai2的聊天机器人Web应用上测试,并在GitHub上获取,促进了AI技术的开放共享。

Move over, DeepSeek. There’s a new AI champion in town — and they’re American.

On Thursday, Ai2, a nonprofit AI research institute based in Seattle, released a model that it claims outperforms DeepSeek V3, one of Chinese AI company DeepSeek’s leading systems.

Ai2’s model, called Tulu3-405B, also beats OpenAI’s GPT-4o on certain AI benchmarks, according to Ai2’s internal testing. Moreover, unlike GPT-4o (and even DeepSeek V3), Tulu3-405B is open source, which means all of the components necessary to replicate it from scratch are freely available and permissively licensed.

A spokesperson for Ai2 told TechCrunch that the lab believes Tulu3-405B “underscores the U.S.’ potential to lead the global development of best-in-class generative AI models.”

“This milestone is a key moment for the future of open AI, reinforcing the U.S.’ position as a leader in competitive, open-source models,” the spokesperson said. “With this launch, Ai2 is introducing a powerful, U.S.-developed alternative to DeepSeek’s models — marking a pivotal moment not just in AI development, but in showcasing that the U.S. can lead with competitive, open-source AI independent of the tech giants.”

Tulu3-405B is a rather large model. Containing 405 billion parameters, it required 256 GPUs running in parallel to train, according to Ai2. Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.

Ai2 tested Tulu3-405B on a number of benchmarks, including math and general knowledge tests. Image Credits:Ai2

According to Ai2, one of the keys to attaining competitive performance with Tulu3-405B was a technique called reinforcement learning with verifiable rewards. Reinforcement learning with verifiable rewards, or RLVR, trains models on tasks with “verifiable” outcomes, like math problem solving and following instructions.

Ai2 claims that on the benchmark PopQA, a set of 14,000 specialized knowledge questions sourced from Wikipedia, Tulu3-405B beat not only DeepSeek V3 and GPT-4o, but also Meta’s Llama 3.1 405B model. Tulu3-405B also had the highest performance of any model in its class on GSM8K, a test containing grade school-level math word problems.

Tulu3-405B is available to test via Ai2’s chatbot web app, and the code to train and fine-tune the model is on GitHub. Get it while it’s hot — before the next benchmark-beating flagship AI model comes along.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Tulu3-405B 开源AI 强化学习 AI基准测试 美国AI
相关文章