少点错误 02月25日
Technical comparison of Deepseek, Novasky, S1, Helix, P0
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文概述了人工智能推理和机器人控制领域的最新进展,对比了NovaSky和S1等模型,它们专注于使用精选的推理数据集微调大型语言模型。Deepseek通过强化学习实现推理能力,无需监督微调。Physical Intelligence将扩散模型应用于机器人控制,而Helix则采用了一种新颖的视觉-语言-动作模型,该模型可以在边缘设备上运行,并使用自然语言指令进行任务指定。这些创新表明,AI系统正朝着更高效、更智能、更自主的方向发展。

🚀NovaSky与S1:两者都致力于在有限的计算资源下,通过微调大型语言模型来提升推理能力。NovaSky侧重于使用来自不同领域的1.7万个训练样本训练整个模型,而S1则专注于测试时扩展,通过“wait”令牌来延长推理时间,但其微调数据集仅包含1000个精选问题。

🤖Physical Intelligence:该模型没有从头开始,而是从视觉-语言模型PaliGemma入手,使其能够理解视觉线索和语言命令。它使用了一个包含七种不同机器人配置执行68个任务的大型数据集(约10000小时)。该团队还将扩散模型的思想应用于机器人连续动作生成领域,将动作生成视为一个连续的去噪过程。

💡Helix VLA模型:该模型使用单一神经网络,无需针对特定任务进行微调,即可实现多机器人协作,并在低功耗边缘GPU上运行。它使用了一个500小时的高质量、多机器人、多操作员数据集,并通过自然语言指令指定新技能,无需大量演示或专家手动编程。

Published on February 25, 2025 4:20 AM GMT

Comparing Novasky with S1:

NovaSky by Berkeley club, S1 by Feifei Li (arXiv:2501.19393), are the players who don't have capital or compute, mainly focuses on developing method that finetune a large language model with curated minimal reasoning datasets. Sky-T1 trained an entire model, with datasets from diverse domains, 17K training examples. S1 is focusing more on test-time scaling which is extending the reasoning time when needed through "wait" token repeatedly when it want to terminate the reasoning. But the datasets used to finetune is much smaller, which is only 1000 chosen questions with detailed reasoning traces. One method is more versatile, the other one is focused on math problem by increasing inference time for better performance. 

The players with strongest tech team, but in stealth mode who only do frontier experimentation. 

These are the most terrifying players, such as SSI, Keen technology hiding everything behind.

Looking into Deepseek:

Deepseek: A true technical breakthrough has emerged in which reasoning ability is achieved entirely through reinforcement learning (RL) without any reliance on supervised finetuning. This approach is grounded in a novel mathematical framework—Group Relative Policy Optimization (GRPO)—that employs an objective function to ensure the learning process is stable and gradual rather than chaotic, preventing sudden changes in behavior at each step. For every question, the old policy generates a batch of answers, and rewards are assigned based on the relative improvement of the entire set of answers rather than an absolute right-or-wrong measure; the policy is then updated in favor of answers with a positive advantage. The training follows a simple template, typically formatted as a sequence of a “<think>” followed by an “<answer>,” and an “aha moment” emerges as the system develops self-reflection—re-evaluating its own reasoning and consciously monitoring its thought process. Innovations such as the creation of the R1 finetuning dataset provide concrete examples of reasoning, while guided reflection and verification through RL optimize reasoning for specific tasks using reward signals that combine language consistency with general accuracy. This method extends reasoning to diverse tasks, such as writing, and ultimately employs RL with a neural reward model to favor helpfulness and harmlessness. 

Moreover, the approach enables the transfer of reasoning ability from large models to smaller ones by utilizing a base architecture combining LLaMA and Qwen, and applying RL-based distillation from large to small models. Future application models are expected to be smaller and more ubiquitous, and breakthroughs in intelligence will likely require more computational power—RLDS has demonstrated that reasoning ability can be directly learned rather than emerging solely as a consequence of scaling laws.

 

Physical intelligence:

The training method behind is first, they didn't start from scratch but started with a vision-language model called PaliGemma. This makes the the system understand both visual cues and language commands without learning. Regarding datasets, a very large and varied dataset (about 10,000 hours) collected across seven different robot configurations performing 68 tasks. The effort behind data collection is what makes this model stand out and this is where the major cost went to. 

The team adopted ideas from diffusion models to the domain of continuous action generation for robotics. Action generation is not viewed as direct regression or discrete prediction problem, but a continuous denoising process. The method adds controlled noise to target actions, then train the model to learn how to find the right direction to denoise back to target actions. This technique was originally used for image generation for diffusion model but creatively applied for robotics. 

Another innovation is in order to have continuous control of robots, gradual refining process is trained with the introduction of time-dependent parameter to control amount of noise added. It means from time to time, the noisy examples is added gradually from very noisy to nearly clean. Teaching the model how to gradually correct from randomness to desired control demand.

Helix Vision-Language-Action (VLA) model:

One neural network without the need to finetune for specific, Multi-robot collaboration, run on lower power consumption edge pair GPUs(2), these altogether makes the shocking breakthrough in robotic foundational model. The dataset they used is a 500 hour high quality, multi-robot, multi-operator dataset of diverse teleoperated behaviors. 500 hours is an extremely small datasets compared to physical intelligence, and other robotic model players which make me guess they use very little compute to train the Helix model. 

From the blog, they explained the current state of of teaching robots a single new behavior require huge amount of demonstrations, or expert manual programming. And they dropped the answer I have been searching for months, which is the new scaling law without demanding more data. With Helix, new skills can be specified with language. This to me feels like a major breakthrough in robotics, because researchers can just prompt the robot AI with experiment they want to conduct, then robot can check manual and learn to do it without previous training on the procedures. 

There was a hackathon happened in AGI house where a man gave out a challenge of using unsupervised learning to learn movement trajectory from Video data. But with Helix's innovation where they use language directly, it skipped the need to learn from videos. I believe the key innovation of Helix isn't just running on edge devices, but their solution to the "System 1/System 2" problem - having a unified architecture that combines slow, deliberate reasoning with fast, reactive control. This decoupled architecture may be why they can achieve strong results with much less data.

The fact that they created a system that runs entirely on embedded GPUs while maintaining sophisticated capabilities suggests they've made significant optimizations in model efficiency.

Usually, when we do model merging, from distinct architecture or domains, the performance drop. I wonder how helium deliver by having two architecture work together under 1 system. 

They mentioned using matching pair from prompt to movement from videos. the training pair is called natural language-conditioned training pairs, we use an auto-labeling VLM to generate hindsight instructions. The VLM processes segmented video clips from the onboard robot cameras, prompted with: "What instruction would you have given the robot to get the action seen in this video?
 

Comparing the two models

Physical Intelligence adapts diffusion models (typically used for image generation) to robot control, viewing actions as noise that needs to be gradually reduced to reach target behaviors.

Helix uses a two-part system where:

Conclusion: The Evolving Landscape of AI and Robotics

The recent advances in AI reasoning and robotic control systems reveal distinct approaches to solving complex challenges in artificial intelligence. From specialized reasoning models like NovaSky and S1 to breakthrough robotic systems like Physical Intelligence and Helix, we're witnessing a paradigm shift in how AI systems learn and operate in the physical world.

Key Insights

    Efficiency vs. Scale: While traditional approaches relied heavily on massive datasets and computational resources, newer models like Helix demonstrate that architectural innovation can dramatically reduce data requirements. This shift from "more data" to "smarter architecture" represents a fundamental change in AI development strategy.Specialized Architecture: The System 1/System 2 approach employed by Helix elegantly solves the dual challenges of high-level reasoning and real-time control by separating concerns while maintaining a unified training process. This decoupling enables both deep understanding and millisecond-level responsiveness without sacrificing either.Novel Training Methodologies: The adaptation of techniques from other domains—like using diffusion models for robot control in Physical Intelligence or reinforcement learning for reasoning in Deepseek—shows that cross-pollination of ideas continues to drive the field forward.Edge Computing: Running sophisticated AI directly on robots with embedded GPUs, as demonstrated by Helix, marks a crucial step toward autonomous systems that don't rely on cloud connectivity, potentially democratizing access to advanced robotics.

Future Implications

These developments suggest we're entering a new era where AI systems can:

The convergence of advanced reasoning capabilities with dexterous physical control systems brings us closer to general-purpose robots that can understand, learn, and adapt to the world in ways previously limited to science fiction. Rather than following pre-programmed routines, these systems can generate novel behaviors on demand and refine them through experience.

As these technologies mature, the focus will likely shift from raw performance metrics to usability, safety, and integration into human environments. The ultimate success of these systems will depend not just on their technical capabilities, but on how effectively they can augment human potential and address real-world challenges. If you want to chat with me for more insights/ is looking to hire AI researcher please send me a DM or email me.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI推理 机器人控制 强化学习 扩散模型 边缘计算
相关文章