Zeroth Principles of AI 2024年12月07日
Altman Upsetting Investors?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

OpenAI首席执行官Sam Altman近期暗示将发布重大消息,可能颠覆许多投资者的预期.有人猜测这与OpenAI开源、加强监管或GPT-5延期有关.本文作者大胆推测,OpenAI可能发现了无需GPU的全新自然语言理解(NLU)算法,这将极大降低大语言模型(LLM)的训练成本.作者基于自身长期研究的小型语法模型(SSM),阐述了这一推测的可能性.SSM无需依赖昂贵的GPU,即可在短时间内完成训练,并实现高效的语言理解.作者认为,当前基于深度学习的NLU算法可能过于依赖词向量,而忽略了更高效的语法优先方法.OpenAI若转向此类算法,将对整个行业产生深远影响.

🔮OpenAI首席执行官Sam Altman暗示将发布重大消息,这则消息可能会让许多投资者感到不安,这引发了业界的广泛猜测。

🌌有人推测这与OpenAI开源、加强监管或GPT-5延期有关,但作者提出了一个更大胆的猜想:OpenAI可能发现了无需GPU的全新自然语言理解(NLU)算法,这将极大降低大语言模型(LLM)的训练成本。

💡作者基于自身自2001年以来长期研究的小型语法模型(SSM),阐述了这一推测的可能性。SSM是一种更小、更快、更便宜的LLM,它可以在笔记本电脑上,使用少量语料库,在几分钟内完成训练,而无需使用GPU。

⚙️作者认为,当前基于深度学习的NLU算法可能过于依赖词向量,这种“语义优先”的方法需要大量的计算资源。而SSM采用“语法优先”的方法,直接从字符级别学习语言,效率更高。

🔬作者指出,深度学习算法在文本处理中使用了二维卷积,而实际上文本中的相关性可以通过索引方法更有效地找到。SSM使用离散神经元和集合论方法,进一步降低了计算复杂度。

Sam Altman (CEO of OpenAI that made ChatGPT) has recently been saying that he is about to make a major announcement that will upset many investors.

Some people speculate this will be about OpenAI Open-Sourcing something. Some think he will ask for more regulation. Some think he will announce a long delay before GPT5. In fact, OpenAI hasn’t even started training it yet.

My really wild speculation: "We don't actually need GPUs"

The GPT5 delay could have been brought about by research at OpenAI leading to discovery of much much cheaper GPU-free LLM algorithms, and that those algorithms may not yet be quite ready for prime time.

The “disappointing investors” warning would be because cloud services involving GPUs would not be needed for language Understanding henceforth; they would still be critical for images, video, speech, sound, scientific applications, etc. This would upset many budgets and companies selling GPUs.

These GPU-free Natural Language Understanding (NLU) algorithms exist. I have been researching LLMs in my company Syntience Inc since 2001. Our product is a smaller, faster, cheaper kind of LLM that we now call an SSM – a Small Syntax Model. We can create a “useful” SSM on a laptop in under five minutes using a mere 5MB of corpus, and without using a GPU. We have a UM1 demo server in the cloud that loads a small SSM learned for just a few hours. Code to test this demo server is posted on GitHub.

Almost a year ago I posted a summary of how my SSMs are created and how they are used on my main publishing site. Chapter 8 discusses the “OL” learning algorithm and Chapter 9 the cloud based “UM1” runtime service. Note that the language in this one year old chapter does not use the term “SSM” since I only started using it recently.

https://experimental-epistemology.ai/organic-learning/

How did we get here?

I have a “just-so” story that I made up from whole cloth because I wasn’t in the room when it happened. Consider this fictitious scenario:

Sometime between 2006-2014 people like Geoff Hinton get Deep learning (DL) working well for Understanding images.

By that time, and probably independently, some NLP researcher(s) invent termvectors and word2vec. These ideas provide the functionality for the famous equation of KING - MAN + WOMAN = QUEEN by allowing for Linear Algebra to work in a high-dimensional semantic concept space.

It is now a natural step for DL researchers to attempt to Understand human language by converting the input text to a strange kind of “image” using term vector lookup for the translation from, well, terms to vectors. And then to use the Image Understanding algorithms they had already developed to Understand text.

And this worked really well, and was the basis for many years of rapid improvement in DNN based NLU.

But my theory (in this fictitious story) is that they got too lucky too early.

They went with term vectors because it worked. And never bothered searching for a cheaper alternative.

So these algorithms are starting from the semantics (of terms at the word level) imported from the outside (as gathered by word2vec) and they then attempt to learn the syntax of the language from the main learning corpus. I call these Semantics-First algorithms.

When learning syntax, they will be schlepping around these termvectors. Which is very expensive. Which is why they need to run on powerful and expensive GPUs.

The most important algorithm in a Deep Neural Network stack is Convolution. This is used for correlation discovery. In images, correlation discovery requires that multiple passes be made over the whole image, performing various matrix operations using Linear Algebra.

In text, all possible correlations are in the (linear) past text that has already been read and they can be found using indexing methods such as those used for web search. A more effective indexing method capable of preserving much more context is a neural network using discrete neurons. This is what we use, and is discussed in Chapter 9.

So according to my just-so story, the ML community turned a 1-Dimensional indexed correlation lookup into a 2D convolution that required searching for these correlations. And there’s more: the convolution must be done repeatedly before it converges, because adjusting weights partially invalidates previous efforts.

And these DL algorithms operate in an Euclidean space, which means distance measurements involve squares of hundreds of floating point numbers and square roots. In contrast, my SSMs use Jaccard distance in an even higher-dimensional boolean space. Most of my algorithms are based on set theory.

These are the reasons LLMs cost OpenAI on the order of $Billions to train their LLMs. GPUs are expensive.

Learning language directly, character by character, is easily a million times faster than using termvectors. It produces SSMs instead of LLMs, because it didn’t start with semantics. We know SSMs can handle classification. Can they handle dialog?

Nobody knows.

Are term vectors really necessary for dialog?

Nobody knows.

Or perhaps OpenAI knows.

My algorithm, Organic Learning, has been working since 2017 but I don’t have a machine that is big enough to learn beyond what we needed for classification. We are using 10 years old Apple Macintosh Pro Late 2103 machines for all our research.

OpenAI certainly has the funding, compute, and talent they would need in order to switch to Syntax-First algorithms like mine. There may well be others working on similar ideas, and I predict we will see more research activity in this area now that we know it’s possible to at least get this far.

My company needs a 4TB RAM server with about 220 threads for learning various release versions of classifiers in multiple languages and for experiments aimed at learning enough to be able to conduct a dialog in the style of ChatGPT on a millionth of their budget.

We are self-funded and cannot afford such experiments.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

OpenAI 自然语言理解 大语言模型 小型语法模型 深度学习
相关文章