少点错误 2024年07月29日
Wittgenstein and Word2vec: Capturing Relational Meaning in Language and Thought
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

这篇文章探讨了维特根斯坦的“语言游戏”概念与自然语言处理中的词嵌入之间的关系。作者以免疫学研究中的细胞数据分析为例,解释了如何通过将概念映射到多维空间,并根据它们之间的距离来理解它们之间的关系。作者认为,这种关系映射方法可以用来理解思考的过程,并与维特根斯坦的语言游戏理论以及词嵌入模型(例如Word2vec)相呼应。

👨‍🔬 维特根斯坦的“语言游戏”理论认为,语言的意义取决于其在特定语境中的使用方式。词语的意义并非固定不变,而是根据不同的语境和规则而变化。这种关系性的理解方式与现代自然语言处理中的词嵌入模型(例如Word2vec)有着密切的联系。

📊 词嵌入模型将词语表示为多维空间中的向量,并根据词语在语料库中的共现关系来计算向量之间的距离。距离越近,表示词语之间的关联性越强。这种方法与维特根斯坦的“语言游戏”理论相一致,因为词语的意义取决于它们与其他词语之间的关系。

🧠 作者认为,这种关系映射方法不仅可以用来理解自然语言中的概念,还可以用来理解更抽象的思维过程。例如,在免疫学研究中,通过将不同类型的T细胞映射到多维空间,可以根据它们之间的距离来分析它们之间的关系,从而更好地理解它们的性质和功能。

💡 作者认为,这种关系映射方法可以作为“思考”的最佳操作定义。通过在多维空间中构建和操作概念之间的关系,可以更好地理解思维过程。

Published on July 28, 2024 7:55 PM GMT

One-line version of this post: What do Wittgensteinian language games and NLP word embeddings have in common?

 

Four-line version of this post: Relational, praxis-based connections between concepts, represented as “distances” in multidimensional space, capture meaning. The shorter the distance, the more related the concepts. This is how Word2vec works, what Wittgenstein was describing with “language games,” and also the way cell biologists are analyzing the peripheral blood these days. Are these relational maps the way to think about thinking?

 

Multi-line version of this post: This is my first post on LessWrong. (Hi!) I’d love to be less wrong about it.

I was sitting in a meeting that was 50% biologists and 50% computer scientists. The topic of the meeting was about ways to process multi-parametric datasets, where each cell in the peripheral blood was tagged by multiple surface markers that related back to its phenotype and therefore its identity. (The algorithm in question was t-Distributed Stochastic Neighbor Embedding.) Immunologists used to think a T-cell was a T-cell. But in that meeting, we were considering a smear of T-cells in a 32-dimensional T-cell space, clustered by their properties and functional status (activated or exhausted; killer or memory etc).

In the meeting, as I was looking at colored 2D and 3D representations that abstracted features of that higher dimensional space (activated killer T cells on the bottom left in blue; resting memory cells on top in orange; what’s that weird purple cluster in the bottom left? and so on), it occurred to me that this technique was probably good at capturing meaning across the board.

Abstracting meaning from measured distances between mapped concepts isn’t a new idea. It’s described beautifully in The Cluster Structure of Thingspace. I just wonder if we can ride it a little further into the fog.

Wittgenstein is often quoted in the Venn diagram overlap region between meaning and computation. The strongest applicable Wittgensteinian concept to this particular space is his idea of a language game. A language game is a process in which words are used according to specific rules and contexts, shaping how we understand meaning and communicate. LessWrong has discussions on the relationship between language games and truth, such as in Parasitic Language Games: maintaining ambiguity to hide conflict while burning the commons, but searching the site reveals less content directly connecting Wittgenstein to phase space, vector space, or thingspace than I’d expect.

Clustering of things in thingspace isn’t a direct Wittgensteinian language game (I don’t think). It seems more like what you’d get if you took a Wittgensteinian approach (praxis-based, relational) and used it to build a vector space for topologies of concept (i.e. for “chairness” and “birdness” and “Golden Gate Bridgeness”).

Word2vec, a natural language processing model, does a simple form of this when it represents words with similar meanings close together in vector space. It seems LLMs do a version of this, with Golden Gate Claude supporting the idea that within LLMs concepts can be topologically localized.

I don’t think there’s enough understood about language processing in the brain to say with certainty that the brain also clusters concepts like this, but I’m guessing it's quite likely.

Short distances between conceptual nodes in a vast relational web seems like a good way to convey meaning. It works for an understanding of concrete words and literal T-cell properties, but it’s also a relational process that maps back to more abstract concepts. In a way, traversing such maps, building patterns within them, running patterns through them, operating on the data they contain, is probably the best operational definition of “thinking” that I can think of.

…Thoughts?



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

维特根斯坦 语言游戏 词嵌入 自然语言处理 思维
相关文章