TechCrunch News 03月08日 00:46
DeepSeek: Everything you need to know about the AI chatbot app
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

中国AI实验室DeepSeek凭借其聊天机器人应用迅速崛起,登顶应用商店排行榜,引发业界对美国AI领导地位和AI芯片需求的讨论。DeepSeek由量化对冲基金High-Flyer Capital Management支持,专注于研发AI工具。尽管受到美国硬件出口禁令的影响,DeepSeek仍通过创新技术和人才策略,推出了DeepSeek-V2和DeepSeek-V3等高性能模型,并在AI基准测试中表现出色,迫使竞争对手降低价格。DeepSeek的R1推理模型在特定领域表现出色,但受中国互联网监管影响,存在内容限制。目前,DeepSeek的商业模式尚不明确,但其模型已受到开发者欢迎,并在Hugging Face上被广泛使用。

🚀 DeepSeek由High-Flyer Capital Management支持,从对冲基金起家,转型为独立的AI研究公司,专注于开发高效的AI模型,并在模型训练上构建了自己的数据中心集群。

💡 DeepSeek-V2模型在通用文本和图像分析方面表现出色,成本效益高,迫使国内竞争对手如字节跳动和阿里巴巴降低模型使用价格,甚至免费提供。

🤖 DeepSeek的R1推理模型在关键基准测试中表现出色,通过有效的事实核查机制,提高了在物理、科学和数学等领域的可靠性,但同时也受到中国互联网监管的限制。

💰 DeepSeek的定价策略低于市场价值,甚至免费提供部分服务,并通过技术突破实现成本竞争力,尽管其成本数据受到一些专家的质疑。

🌐 DeepSeek的模型已被广泛应用于商业用途,在Hugging Face平台上,开发者基于R1模型创建了超过500个衍生模型,累计下载量超过250万次,显示了其在开发者社区的影响力。

DeepSeek has gone viral.

Chinese AI lab DeepSeek broke into the mainstream consciousness this week after its chatbot app rose to the top of the Apple App Store charts (and Google Play, as well). DeepSeek’s AI models, which were trained using compute-efficient techniques, have led Wall Street analysts — and technologists — to question whether the U.S. can maintain its lead in the AI race and whether the demand for AI chips will sustain.

But where did DeepSeek come from, and how did it rise to international fame so quickly?

DeepSeek is backed by High-Flyer Capital Management, a Chinese quantitative hedge fund that uses AI to inform its trading decisions.

AI enthusiast Liang Wenfeng co-founded High-Flyer in 2015. Wenfeng, who reportedly began dabbling in trading while a student at Zhejiang University, launched High-Flyer Capital Management as a hedge fund in 2019 focused on developing and deploying AI algorithms.

In 2023, High-Flyer started DeepSeek as a lab dedicated to researching AI tools separate from its financial business. With High-Flyer as one of its investors, the lab spun off into its own company, also called DeepSeek.

From day one, DeepSeek built its own data center clusters for model training. But like other AI companies in China, DeepSeek has been affected by U.S. export bans on hardware. To train one of its more recent models, the company was forced to use Nvidia H800 chips, a less-powerful version of a chip, the H100, available to U.S. companies.

DeepSeek’s technical team is said to skew young. The company reportedly aggressively recruits doctorate AI researchers from top Chinese universities. DeepSeek also hires people without any computer science background to help its tech better understand a wide range of subjects, per The New York Times.

DeepSeek unveiled its first set of models — DeepSeek Coder, DeepSeek LLM, and DeepSeek Chat — in November 2023. But it wasn’t until last spring, when the startup released its next-gen DeepSeek-V2 family of models, that the AI industry started to take notice.

DeepSeek-V2, a general-purpose text- and image-analyzing system, performed well in various AI benchmarks — and was far cheaper to run than comparable models at the time. It forced DeepSeek’s domestic competition, including ByteDance and Alibaba, to cut the usage prices for some of their models, and make others completely free.

DeepSeek-V3, launched in December 2024, only added to DeepSeek’s notoriety.

According to DeepSeek’s internal benchmark testing, DeepSeek V3 outperforms both downloadable, openly available models like Meta’s Llama and “closed” models that can only be accessed through an API, like OpenAI’s GPT-4o.

Equally impressive is DeepSeek’s R1 “reasoning” model. Released in January, DeepSeek claims R1 performs as well as OpenAI’s o1 model on key benchmarks.

Being a reasoning model, R1 effectively fact-checks itself, which helps it to avoid some of the pitfalls that normally trip up models. Reasoning models take a little longer — usually seconds to minutes longer — to arrive at solutions compared to a typical non-reasoning model. The upside is that they tend to be more reliable in domains such as physics, science, and math.

There is a downside to R1, DeepSeek V3, and DeepSeek’s other models, however. Being Chinese-developed AI, they’re subject to benchmarking by China’s internet regulator to ensure that its responses “embody core socialist values.” In DeepSeek’s chatbot app, for example, R1 won’t answer questions about Tiananmen Square or Taiwan’s autonomy.

If DeepSeek has a business model, it’s not clear what that model is, exactly. The company prices its products and services well below market value — and gives others away for free.

The way DeepSeek tells it, efficiency breakthroughs have enabled it to maintain extreme cost competitiveness. Some experts dispute the figures the company has supplied, however.

Whatever the case may be, developers have taken to DeepSeek’s models, which aren’t open source as the phrase is commonly understood but are available under permissive licenses that allow for commercial use. According to Clem Delangue, the CEO of Hugging Face, one of the platforms hosting DeepSeek’s models, developers on Hugging Face have created over 500 “derivative” models of R1 that have racked up 2.5 million downloads combined.

DeepSeek’s success against larger and more established rivals has been described as “upending AI” and “over-hyped.” The company’s success was at least in part responsible for causing Nvidia’s stock price to drop by 18% in January, and for eliciting a public response from OpenAI CEO Sam Altman.

Microsoft announced that DeepSeek is available on its Azure AI Foundry service, Microsoft’s platform that brings together AI services for enterprises under a single banner. When asked about DeepSeek’s impact on Meta’s AI spending during its first-quarter earnings call, CEO Mark Zuckerberg said spending on AI infrastructure will continue to be a “strategic advantage” for Meta.

During Nvidia’s fourth-quarter earnings call, CEO Jensen Huang emphasized DeepSeek’s “excellent innovation,” saying that it and other “reasoning” models are great for Nvidia because they need so much more compute.

At the same time, some companies are banning DeepSeek, and so are entire countries and governments, including South Korea. New York state also banned DeepSeek from being used on government devices.

As for what DeepSeek’s future might hold, it’s not clear. Improved models are a given. But the U.S. government appears to be growing wary of what it perceives as harmful foreign influence. In March, The Wall Street Journal reported that the U.S. will likely ban DeepSeek on government devices.

This story was originally published January 28, 2025, and will be updated regularly.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

DeepSeek 人工智能 AI模型 中国AI R1推理模型
相关文章