TechCrunch News 01月12日
Researchers open source Sky-T1, a ‘reasoning’ AI model that can be trained for less than $450
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

加州大学伯克利分校Sky Computing Lab的研究团队NovaSky发布了名为Sky-T1-32B-Preview的推理模型,该模型在多项关键基准测试中与OpenAI的早期版本o1竞争。Sky-T1是第一个真正开源的推理模型,其训练数据和代码均已公开,且训练成本仅为450美元。推理模型能够进行自我事实检查,从而避免常见错误,尽管耗时稍长,但在物理、科学和数学等领域表现更可靠。NovaSky团队利用阿里巴巴的QwQ-32B-Preview生成初始训练数据,并使用GPT-4o-mini进行数据优化。Sky-T1使用8个Nvidia H100 GPU训练了约19小时,并在MATH500和LiveCodeBench测试中优于o1的早期版本。尽管在GPQA-Diamond测试中表现稍逊,但Sky-T1的发布标志着开源推理模型发展的开端。

🚀Sky-T1-32B-Preview是首个真正开源的推理模型,其训练数据和代码均公开,可从头开始复现,打破了技术壁垒。

💰Sky-T1的训练成本极低,仅需不到450美元,证明了高效且经济地复制高级推理能力是可行的。

🤔推理模型具备自我事实检查能力,能有效避免传统AI模型的常见错误,在特定领域表现出更高的可靠性。

📊Sky-T1在MATH500和LiveCodeBench测试中优于OpenAI的早期版本o1,但在GPQA-Diamond测试中稍逊,表明其在不同领域的表现存在差异。

🛠️NovaSky团队表示将继续开发更高效的开源推理模型,并探索先进技术以提高模型的效率和准确性。

So-called reasoning AI models are becoming easier — and cheaper — to develop.

On Friday, NovaSky, a team of researchers based out of UC Berkeley’s Sky Computing Lab, released Sky-T1-32B-Preview, a reasoning model that’s competitive with an earlier version of OpenAI’s o1 on a number of key benchmarks. Sky-T1 appears to be the first truly open source reasoning model in the sense that it can be replicated from scratch; the team released the data set they used to train it as well as the necessary training code.

“Remarkably, Sky-T1-32B-Preview was trained for less than $450,” the team wrote in a blog post, “demonstrating that it is possible to replicate high-level reasoning capabilities affordably and efficiently.”

Unlike most AI, reasoning models effectively fact-check themselves, which helps them to avoid some of the pitfalls that normally trip up models. Reasoning models take a little longer — usually seconds to minutes longer — to arrive at solutions compared to a typical non-reasoning model. The upside is, they tend to be more reliable in domains such as physics, science, and mathematics.

The NovaSky team says it used another reasoning model, Alibaba’s QwQ-32B-Preview, to generate the initial training data for Sky-T1, then “curated” the data mixture and leveraged OpenAI’s GPT-4o-mini to refactor the data into a more workable format. Training the 32-billion-parameter Sky-T1 took about 19 hours using a rack of 8 Nvidia H100 GPUs. (Parameters roughly correspond to a model’s problem-solving skills.)

According to the NovaSky team, Sky-T1 performs better than an early preview version of o1 on MATH500, a collection of “competition-level” math challenges. The model also beats the preview of o1 on a set of difficult problems from LiveCodeBench, a coding evaluation.

However, Sky-T1 falls short of the o1 preview on GPQA-Diamond, which contains physics, biology, and chemistry-related questions a PhD graduate would be expected to know.

Also important to note is that OpenAI’s GA release of o1 is a stronger model than the preview version of o1, and that OpenAI is expected to release an even better-performing reasoning model, o3, in the weeks ahead.

But the NovaSky team says that Sky-T1 only marks the start of their journey to develop open source models with advanced reasoning capabilities.

“Moving forward, we will focus on developing more efficient models that maintain strong reasoning performance and exploring advanced techniques that further enhance the models’ efficiency and accuracy at test time,” the team wrote in the post. “Stay tuned as we make progress on these exciting initiatives.”

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

开源模型 推理AI Sky-T1 AI训练 深度学习
相关文章