Gemini Diffusion: watch this space

少点错误 05月21日 03:37

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

Google Deepmind推出了Gemini Diffusion，这是一项与大型语言模型（LLMs）截然不同的技术。与预测下一个标记的LLMs不同，Gemini Diffusion通过迭代地去噪所有输出标记来生成连贯的结果，类似于图像扩散模型的工作方式。测试结果表明，Gemini Diffusion速度极快，平均每秒处理近1000个标记，并且在解决Google面试问题时表现出色。虽然其性能不及Gemini 2.5 pro，但优于ChatGPT 3。这项技术标志着继人类和LLMs之后的第三种智能形式的出现。

🚀 Gemini Diffusion是一种全新的技术，它与LLMs有所不同。LLMs预测下一个标记，而Gemini Diffusion通过迭代去噪所有输出标记来生成连贯的结果，类似于图像扩散模型。

⚡️ Gemini Diffusion的速度非常快，平均每秒处理近1000个标记。在测试中，它能够快速给出问题的完美答案，虽然在后续问题上稍有不足。

💡 Gemini Diffusion在某些方面优于ChatGPT 3，但性能不及Gemini 2.5 pro。这项技术代表了继人类和LLMs之后的第三种智能形式。这种技术可能为文本编辑带来新的可能性，例如能够原生编辑文本中间的内容。

🤔 Gemini Diffusion目前还处于演示阶段，未来还有很大的优化空间。随着性能的提升和优化，扩散模型有可能成为新的技术发展方向。

Published on May 20, 2025 7:29 PM GMT

Google Deepmind has announced Gemini Diffusion. Though buried under a host of other IO announcements it's possible that this is actually the most important one!

This is significant because diffusion models are entirely different to LLMs. Instead of predicting the next token, they iteratively denoise all the output tokens until it produces a coherent result. This is similar to how image diffusion models work.

I've tried they results and they are surprisingly good! It's incredibly fast, averaging nearly 1000 tokens a second. And it one shotted my Google interview question, giving a perfect response in 2 seconds (though it struggled a bit on the followups).

It's nowhere near as good as Gemini 2.5 pro, but it knocks ChatGPT 3 out the water. If we'd seen this 3 years ago we'd have been mind blown.

Now this is wild for two reasons:

We now have a third species of intelligence, after humans and LLMs. That's pretty significant in and of itself.This is the worst it'll ever be. This is a demo, presumably from a relatively cheap training run and way less optimisation than has gone into LLMs. Diffusion models have a different set of trade offs to LLMs, and once benchmark performance is competitive it's entirely possible we'll choose to focus on them instead.

For an example of the kind of capabilities diffusion models offer that LLMs don't, you don't need to just predict tokens after a piece of text: you can natively edit somewhere in the middle. Also since the entire block is produced at once, you don't get that weird behaviour where an LLM says one thing then immediately contradicts itself.

So this isn't something you'd use just yet (probably), but watch this space!

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签