Artificial Fintelligence 2024年10月22日
The evolution of the LLM API market
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了LLM API市场的发展情况。随着更多公司进入,市场竞争加剧。GPT-4在高端市场暂无竞争,而低端市场有开源社区推动。市场将呈两极分化,且随着工具发展,开发者会选择成本最低的模型。同时,一些公司为降低成本会训练自己的模型。

🎯LLM API市场起初OpenAI占据垄断地位,但随后竞争加剧。除GPT-4外,其他模型都面临竞争,这限制了公司的定价。

💪公司进入新市场需考虑利润阈值。随着公司发展,通过优化可提高利润,但竞争对手也在做同样的事,会侵蚀利润空间。

📈LLM市场将呈两极分化,高端是昂贵的高性能模型,低端是成本较低的模型。开源社区推动低端模型质量提升、成本降低。

🤔对于LLM API的购买者,需考虑任务复杂度对模型的需求。简单任务可选择成本低的模型,一些成功的公司为降低成本会训练自己的模型。

Before I studied machine learning, I was an Econ grad student banging out OLS problem sets (I see the OLS equation— (X’X)^-1X’y— whenever I close my eyes, I derived it so many times). My research area was antitrust theory, and in particular, vertical integration. That gives me a unique perspective: how will the LLM API market evolve as more companies enter the space?

The market began, famously, with OpenAI releasing ChatGPT and rapidly hitting $1.3B in revenue. At this time last year, however, there was basically no competition in the LLM API market. Bard was yet to be released, let alone Claude, and Gemini was a mere twinkle in Sundar’s eyes. OpenAI had a monopoly in the market, letting them capture basically all of the value.

Artificial Fintelligence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In the year since, what we’ve seen is that there doesn’t appear to be a moat in LLMs except at the highest end. GPT-4 is the only model which doesn’t have competition, and there are competitors sniffing around— Gemini Ultra, Llama 3, and the as-yet-unreleased mysterious Mistral model bigger than medium. At the GPT 3.5 level, however, you have many options for hosting, and you can even host it yourself. This necessarily limits the prices any company can charge.

Generally speaking, companies enter a new market when they think they can make a profit above the minimum threshold they require. The larger the company is, the smaller the profit threshold they require. If I, an individual, were to start offering a service to finetune LLMs, I would need to charge a fairly high margin at first, as I would have a small customer base to spread the costs over. As my company grows, I would have a larger customer base to spread the costs over, and would have more money to spend on optimizations enabling me to serve LLMs for cheaper:

with each optimization that you do to make your own process more efficient, you increase your margin. That’s great! You make more money per token. Right? Well, not quite. In a vacuum with a spherical cow, you do. But just as you invest in your ability to serve tokens more efficiently, your competitors are all doing the same, eroding your margins. To do a bad Ben Horowitz impersonation, You run this hard just to stay in place.

The necessary implication is that the undifferentiated LLM market will become a ruthless competition for efficiency, with companies competing to see who can demand the lowest return on invested capital.

In the classic business strategy book, the Innovator’s Dilemma, there lives what is the canonical example for how technological disruption happens (this is taken from the New Yorker profile on the author, Clayton Christensen):

In the world of steel manufacturing, historically, steel was made in massive integrated mills. They made high quality steel with reasonable margins. Then came along electric mini mills. These mills were able to make the lowest quality steel at a cheaper cost. The large steel manufacturers saw this, shrugged, and focused on making high quality steel at a (relatively) high margin. Over time, the electric mini mill operators figured out how to make higher and higher quality steel, moved upmarket, and killed the massive integrated mills (US Steel— once the 16th largest corporation by market cap in the US— was removed from the S&P 500 in 2014).

Get 20% off a group subscription

The analogy to LLMs is straightforward. The large labs focus on making the highest performing models. They are expensive, but excellent, and outperform every other model. However, they are expensive. You need margin to pay for all of those $900k engineers! Even then, however, we see competition on price. Gemini Pro

At the low end, we have the open source community, led by Meta and r/LocalLlama, which are cranking out high quality models and figuring out how to serve them on ridiculously low powered machines. We should expect to see the open weight models improve in quality and decrease in cost (on a quality adjusted basis), putting pressure on the margins of the largest labs. As a real-time example, Together came out with a hosted version of Mixtral that is 70% cheaper than Mistral’s own version.

We should thus expect a bifurcated market. At the high end live more expensive, higher quality models, and at the low end, lower quality, less expensive models. For open weights models, we should expect their price to converge to price of GPUs + electricity (and as competition increases in the GPU market, perhaps just to the price of electricity).

The question, then, is what does the buyer for these APIs look like? If we were to rank the economically valuable tasks that LLMs can perform from most complex to least complex, how many of the tasks require high end complexity? At some point there’s a threshold where GPT-4 is required, but it’s hard to image that the threshold will remain static. The open weight models will continue their inexorable climb up the list, biting at the margins of the large labs. As tooling makes it easier to effortlessly switch between model APIs, the developers using the API will switch to whatever the lowest cost model is that accomplishes their task. If you’re using a LLM for, say, short-length code completion, do you need the biggest and best model? Probably not!

Moreover, the companies with the biggest success in the consumer marketplace will inevitably start to balk at paying a significant amount of their profits to another company, and will start to train their own models. We see companies like Harvey, and Cursor, which were some of the companies with the earliest access to GPT-4, start to hire research scientists/engineers, giving them the talent required to train their own foundation models. As API fees are probably the biggest expense for these companies, it seems natural that they will do everything they can to reduce their costs as much as possible.

If you’re building your own models, you can go out and raise a round of investment to invest in your own models, trading off a one-time capital expenditure to increase your overall margins. This is the justification for Google’s TPU program, for example. By spending billions of dollars on custom silicon, they’re able to avoid paying Nvidia’s Danegeld.

The conclusion, then, is that we will see the market for LLM APIs converge to one of lowest cost as long as your task is simple enough to be solved by open weight models. If your task is so complex that it requires the best model, you’re stuck paying OpenAI. For everyone else, there’s finetuned Mistral 7B.

Artificial Fintelligence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM API 市场竞争 模型质量 成本控制 开源社区
相关文章