Communications of the ACM - Artificial Intelligence 05月08日 00:13
Likewise! AI Presumption
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了人工智能(AI)领域中,特别是大型语言模型(LLM)发展过程中存在的假设和潜在问题。作者通过类比早期专家系统,指出当前AI对语言的理解可能过于依赖书面文本,忽略了口语和其他形式的认知输入。文章强调,虽然语言在AI中扮演关键角色,但不能简单地认为掌握语言就等同于实现了通用人工智能(AGI)。作者呼吁对tokenization,生成预测,以及数字环境本身的限制等机制进行更深入的质疑,以避免重蹈早期AI研究的覆辙。

💡AI研究的每一次浪潮都试图通过计算来实现智能,但如何将人类智能转化为人工智能是一个关键问题。早期AI侧重于事实和推理,而现在则侧重于语言,但都面临着各自的局限性。

📚当前AI发展依赖于海量文本数据,并假设语言可以被数字化捕捉,且书面语言蕴含了智能。然而,这种对书面语言的过度依赖可能忽略了口语和其他形式的认知输入。

⚠️作者质疑对书面文本的知识完整性的信任,以及tokenization,生成预测等机制的信任,并指出数字环境本身可能存在限制。这些假设需要进一步的审视和质疑。

🗣️ 探索从口语数据中获取输入的可行性,这可能取代对书面语言的过度依赖,并可能带来新的假设。对事实与推理、语言,以及书面语与口语之间的数据潜力进行研究是值得的。

Those of us inclined to lob accusations of hubris at new forms of artificial intelligence, on the grounds that we observed (and possibly fomented) the same kind of affliction in the classical forms, should at least clarify the analogy. What are the similarities that advise caution, and what are the differences that might alleviate it?

We address primarily the large language model, and draw attention to its use of text. Consider a traditional linguistics class of some years ago. The first thing students learned was that written language is of lesser interest; the action is in spoken language.1 And “action” means humans’ use of language, its acquisition and development, its sharing and variation—and the implications for the study of human thinking. Written language, which involves planning and editing, is derivative, not spontaneous, not universal, and relies on arbitrary representation learned explicitly. So what, then, does this mean for generative AI?

Each wave of artificial intelligence research has claimed that intelligence via computation is within reach, with a connection between human intelligence, and computed, or artificial, intelligence. The question is how to get from the human form to the artificial form.

The early AI wave focused on Facts and Reasoning as the manifestation of intelligence. Efforts such as Dendral and MYCIN were successful in their limited domains. Computer intelligence was within our grasp—that was the attitude. With Facts and Reasoning implemented as the obvious, Database and Logic, the last step, from there to computer intelligence, would be a piece of cake, meaning “easy” or “suitable for graduate students,” just a matter of scaling up, filling out the details with the computing mechanisms at hand.

But the approach, expert systems, was found to be programs of unfortunate sterility. The piece-of-cake work had to augment it with probabilities, common-sense extensions, planning scripts, proceduralized grammars, and other heuristics,4 yielding systems that offer, yes, significant aspects of intelligence.

We can see that pushing the hard part farther down in the toolchain delayed the revelation of assumptions made along the way. Reliance on Facts and Reasoning to yield intelligence exposed the limitations of that approach (sterility), calling for various ad hoc mechanisms.

Now we are persuaded that intelligence is buried in our language, and can be released by pattern-matching across utterances in extremely voluminous and fine-grained contexts, with amazing results, and even in contexts of the contexts, with even more amazing results. And we have lots of Language—we have huge text corpora! With Language implemented in the obvious way, writing, the last step to computer intelligence is a piece of cake, just a matter of scaling up, filling out the details with the computing mechanisms at hand.

The approach, massive statistical processing via machine learning with transformers, is good with prediction but weak on logic and subject to startling departures from truth. The piece-of-cake work requires tuning, re-modeling, and human feedback, yielding another set of systems that offer, yes, significant aspects of intelligence.

This sounds familiar… from just a few paragraphs back. Analysis of weaknesses in the classical expert-system approach revealed hidden assumptions. Can we drag some out here, too, for generative AI? Yes—We assume that language can be captured in digital data from text, and that language scraped exclusively from writing holds intelligence. Although the reliance on written language seems entirely appropriate for a service intended to produce written text, the processing there requires, again, ad hoc human intervention.

But does the analogy endure? Facts and Reasoning are not the same as Language, which may still hold depths of cognitive material. What about gathering inputs from spoken data? Clever minds in High Tech already are thinking about it,3 and that could obviate one assumption, perhaps replacing it with another. Investigation into the relevant differences in AGI data potential between Facts and Reasoning, and Language; or between written and spoken Language, would be worthwhile.

As an example of assumptions, I question trust in the intellectual completeness of writing (textual input). We should also question trust in other mechanisms, such as tokenization, generation as prediction, reliance on human assessment, and constraints of the digital milieu itself. (What would those constraints be?—Who knows?) A real risk is the human tendency to reverse the assumption, to think that rolling back, from the current state of the art, through the toolchain, will tell us what intelligence IS. Language is important, as McShane and Nirenburg state: “[E]nabling machines to emulate human-level language proficiency is well understood to be an AI-complete problem, one whose full solution requires solving the problem of artificial intelligence in general.”2 But note that, for them, language proficiency would be a manifestion of AGI, rather than the other way around.

An argument from analogy is not deductive, of course. Just as well! These observations are about attitudes and discourse in science, not the science itself. All the commentary that has been devoted to warning modern AI enterprises about past hubris is not proof that current efforts will fail. The point is to raise questions about presumptions.

References

1. Cruz-Ferreira, M., and Abraham, S.A. 2011. The Language of Language. A Linguistics Course for Starters. Section 2.6. OERCommons.org.

2. McShane, M., and Nirenburg, S. 2021. Linguistics for the Age of AI. The MIT Press. Creative Commons license CC-BY-NC-ND.

3. Mehta, S., Jojic, N., and Gamper, H. 2025. Make Some Noise: Towards LLM audio reasoning and generation using sound tokens. 2025 International Conference on Acoustics, Speech, and Signal Processing. April 2025.

4. Schubert, L. “Computational Linguistics.” The Stanford Encyclopedia of Philosophy. Spring 2020 Edition. Edward N. Zalta (Ed.).

Robin K. Hill is a lecturer in the Department of Computer Science and an affiliate of both the Department of Philosophy and Religious Studies and the Wyoming Institute for Humanities Research at the University of Wyoming. She has been a member of ACM since 1978.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 语言模型 AGI 书面语言 口语
相关文章