TechCrunch News 2024年12月27日
DeepSeek’s new AI model appears to be one of the best ‘open’ challengers yet
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

中国AI公司DeepSeek发布了强大的开源AI模型DeepSeek V3,该模型在多种文本处理任务上表现出色,包括代码编写、翻译和文章撰写。DeepSeek V3在内部测试中超越了其他开源和闭源模型,并在编程竞赛中击败了Meta的Llama 3和OpenAI的GPT-4o等模型。该模型拥有6850亿参数,训练使用了14.8万亿tokens的数据集。DeepSeek V3的训练成本相对较低,仅为550万美元。尽管如此,该模型在政治敏感话题上存在过滤,体现了中国AI系统对监管的考量。DeepSeek的开源策略对行业产生了重要影响,迫使其他公司降低模型使用价格。

🚀 DeepSeek V3 是一款强大的开源AI模型,可供开发者下载并用于商业应用,它在代码编写、翻译和文本生成等多种任务中表现卓越。

🏆 DeepSeek V3 在内部基准测试中超越了其他开源和闭源模型,并在编程竞赛中击败了包括 Meta 的 Llama 3 和 OpenAI 的 GPT-4o 在内的多个竞争对手。

🧮 DeepSeek V3 拥有 6850 亿参数,并使用 14.8 万亿 tokens 的数据集进行训练,这使其在性能上具有显著优势。尽管模型巨大,但其训练成本仅为 550 万美元。

🇨🇳 DeepSeek V3 虽然性能强大,但其对政治敏感话题的回答存在过滤,反映了中国AI系统在监管下的限制,例如,不会回答有关天安门广场的问题。

💰 DeepSeek 的开源策略迫使字节跳动、百度和阿里巴巴等竞争对手降低其模型的使用价格,甚至部分免费,对行业产生了深远影响。

A Chinese lab has created what appears to be one of the most powerful “open” AI models to date.

The model, DeepSeek V3, was developed by the AI firm DeepSeek, and was released on Wednesday under a permissive license that allows developers to download and modify it for most applications, including commercial ones.

DeepSeek V3 can handle a range of text-based workloads and tasks, like coding, translating, and writing essays and emails from a descriptive prompt.

According to DeepSeek’s internal benchmark testing, DeepSeek V3 outperforms both downloadable, “openly” available models and “closed” AI models that can only be accessed through an API. In a subset of coding competitions hosted on Codeforces, a platform for programming contests, DeepSeek outperforms models including Meta’s Llama 3.1 405B, OpenAI’s GPT-4o, and Alibaba’s Qwen 2.5 72B.

DeepSeek V3 also crushes the competition on Aider Polgyglot, a test designed to measure, among other things, whether a model can successfully write new code that integrates into existing code. 

DeepSeek claims that DeepSeek V3 was trained on a dataset of 14.8 trillion tokens. In data science, tokens are used to represent bits of raw data; 1 million tokens is equal to about 750,000 words.

It’s not just the training set that’s massive. DeepSeek V3 is enormous in size: 685 billion parameters. (Parameters are the internal variables models use to make predictions or decisions.) That’s around 1.6 times the size of Llama 3.1 405B, which has 405 billion parameters.

Parameter count often (but not always) correlates with skill; models with more parameters tend to outperform models with fewer parameters. But large models also require beefier hardware in order to run. An unoptimized version of DeepSeek V3 would need a bank of high-end GPUs to answer questions at reasonable speeds.

While it’s not the most practical model, DeepSeek V3 is an achievement in some respects. DeepSeek was able to train the model using a data center of Nvidia H800 GPUs in just around two months — GPUs that Chinese companies were recently restricted by the U.S. Commerce Department from procuring. The company also claims it only spent $5.5 million to train DeepSeek V3, a fraction of the development cost of models like OpenAI’s GPT-4.

The downside is, the model’s political views are a bit — filtered. Ask DeepSeek V3 about Tiananmen Square, for instance, and it won’t answer.

DeepSeek, being a Chinese company, is subject to benchmarking by China’s internet regulator to ensure its models’ responses “embody core socialist values.” Many Chinese AI systems decline to respond to topics that might raise the ire of regulators, like speculation about the Xi Jinping regime.

DeepSeek, which recently unveiled DeepSeek-R1, an answer to OpenAI’s o1 “reasoning” model, is a curious organization. It’s backed by High-Flyer Capital Management, a Chinese quantitative hedge fund that uses AI to inform its trading decisions.

DeepSeek’s models have forced competitors like ByteDance, Baidu, and Alibaba to cut the usage prices for some of their models — and make others completely free.

High-Flyer builds its own server clusters for model training, one of the most recent of which reportedly has 10,000 Nvidia A100 GPUs and cost 1 billion yen (~$138 million). Founded by Liang Wenfeng, a computer science graduate, High-Flyer aims to achieve “superintelligent” AI through its DeepSeek org.

In an interview earlier this year, Liang described open sourcing as a “cultural act,” and characterized closed-source AI like OpenAI’s a “temporary” moat. “Even OpenAI’s closed-source approach hasn’t stopped others from catching up,” he noted.

Indeed.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

DeepSeek V3 开源AI模型 人工智能 中国AI 模型训练
相关文章