AI News 04月09日 16:08
Deep Cogito open LLMs use IDA to outperform same size models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Deep Cogito 公司发布了一系列开源大型语言模型(LLM),声称在性能上超越了竞争对手,并朝着实现通用超级智能迈出了一步。这家总部位于旧金山的公司推出了参数规模分别为 30 亿、80 亿、140 亿、320 亿和 700 亿的 LLM 预览版。Deep Cogito 强调,这些模型在大多数标准基准测试中都优于同等规模的最佳可用开源模型,包括来自 LLAMA、DeepSeek 和 Qwen 的模型。特别是,Deep Cogito 的 700 亿参数模型甚至超越了最近发布的 Llama 4 1090 亿混合专家(MoE)模型。

💡 Deep Cogito 公司发布了多个开源大型语言模型(LLM),其性能优于同类产品,并声称朝着实现通用超级智能迈出了一步。

⚙️ 这些模型使用了名为“迭代蒸馏与放大”(IDA)的全新训练方法。IDA 旨在克服当前 LLM 训练范式的局限性,通过迭代自我改进来实现可扩展和高效的对齐策略。

⏫ IDA 过程包括两个关键步骤:放大(使用更多计算来使模型获得更好的解决方案或能力,类似于高级推理技术)和蒸馏(将这些放大的能力内化到模型的参数中),从而形成一个“正反馈循环”。

📊 Deep Cogito 的模型针对编码、函数调用和代理使用案例进行了优化,并具有双重功能,即可以直接回答(标准 LLM)或在回答之前进行自我反思(类似于推理模型)。

📈 在各种基准测试中,Cogito 模型通常在性能上相对于 Llama 3 和 Qwen 2.5 等竞争对手显示出显著的提升,尤其是在推理模式下,例如 Cogito 70B 模型在 MMLU 标准模式下达到 91.73%,推理模式下达到 91.00%。

Deep Cogito has released several open large language models (LLMs) that outperform competitors and claim to represent a step towards achieving general superintelligence.

The San Francisco-based company, which states its mission is “building general superintelligence,” has launched preview versions of LLMs in 3B, 8B, 14B, 32B, and 70B parameter sizes. Deep Cogito asserts that “each model outperforms the best available open models of the same size, including counterparts from LLAMA, DeepSeek, and Qwen, across most standard benchmarks”.

Impressively, the 70B model from Deep Cogito even surpasses the performance of the recently released Llama 4 109B Mixture-of-Experts (MoE) model.   

Iterated Distillation and Amplification (IDA)

Central to this release is a novel training methodology called Iterated Distillation and Amplification (IDA). 

Deep Cogito describes IDA as “a scalable and efficient alignment strategy for general superintelligence using iterative self-improvement”. This technique aims to overcome the inherent limitations of current LLM training paradigms, where model intelligence is often capped by the capabilities of larger “overseer” models or human curators.

The IDA process involves two key steps iterated repeatedly:

Deep Cogito says this creates a “positive feedback loop” where model intelligence scales more directly with computational resources and the efficiency of the IDA process, rather than being strictly bounded by overseer intelligence.

“When we study superintelligent systems,” the research notes, referencing successes like AlphaGo, “we find two key ingredients enabled this breakthrough: Advanced Reasoning and Iterative Self-Improvement”. IDA is presented as a way to integrate both into LLM training.

Deep Cogito claims IDA is efficient, stating the new models were developed by a small team in approximately 75 days. They also highlight IDA’s potential scalability compared to methods like Reinforcement Learning from Human Feedback (RLHF) or standard distillation from larger models.

As evidence, the company points to their 70B model outperforming Llama 3.3 70B (distilled from a 405B model) and Llama 4 Scout 109B (distilled from a 2T parameter model).

Capabilities and performance of Deep Cogito models

The newly released Cogito models – based on Llama and Qwen checkpoints – are optimised for coding, function calling, and agentic use cases.

A key feature is their dual functionality: “Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models),” similar to capabilities seen in models like Claude 3.5. However, Deep Cogito notes they “have not optimised for very long reasoning chains,” citing user preference for faster answers and the efficiency of distilling shorter chains.

Extensive benchmark results are provided, comparing Cogito models against size-equivalent state-of-the-art open models in both direct (standard) and reasoning modes.

Across various benchmarks (MMLU, MMLU-Pro, ARC, GSM8K, MATH, etc.) and model sizes (3B, 8B, 14B, 32B, 70B,) the Cogito models generally show significant performance gains over counterparts like Llama 3.1/3.2/3.3 and Qwen 2.5, particularly in reasoning mode.

For instance, the Cogito 70B model achieves 91.73% on MMLU in standard mode (+6.40% vs Llama 3.3 70B) and 91.00% in thinking mode (+4.40% vs Deepseek R1 Distill 70B). Livebench scores also show improvements.

Here are benchmarks of 14B models for a medium-sized comparison:

While acknowledging benchmarks don’t fully capture real-world utility, Deep Cogito expresses confidence in practical performance.

This release is labelled a preview, with Deep Cogito stating they are “still in the early stages of this scaling curve”. They plan to release improved checkpoints for the current sizes and introduce larger MoE models (109B, 400B, 671B) “in the coming weeks / months”. All future models will also be open-source.

(Photo by Pietro Mattia)

See also: Alibaba Cloud targets global AI growth with new models and tools

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Deep Cogito open LLMs use IDA to outperform same size models appeared first on AI News.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Deep Cogito LLM IDA 人工智能
相关文章