arXiv:2501.00226v2 Announce Type: replace Abstract: Large Language Models (LLMs) have demonstrated a remarkable ability to capture extensive world knowledge, yet how this is achieved without direct sensorimotor experience remains a fundamental puzzle. This study proposes a novel theoretical solution by introducing the Collective World Model hypothesis. We argue that an LLM does not learn a world model from scratch; instead, it learns a statistical approximation of a collective world model that is already implicitly encoded in human language through a society-wide process of embodied, interactive sense-making. To formalize this process, we introduce generative emergent communication (Generative EmCom), a framework built on the Collective Predictive Coding (CPC). This framework models the emergence of language as a process of decentralized Bayesian inference over the internal states of multiple agents. We argue that this process effectively creates an encoder-decoder structure at a societal scale: human society collectively encodes its grounded, internal representations into language, and an LLM subsequently decodes these symbols to reconstruct a latent space that mirrors the structure of the original collective representations. This perspective provides a principled, mathematical explanation for how LLMs acquire their capabilities. The main contributions of this paper are: 1) the formalization of the Generative EmCom framework, clarifying its connection to world models and multi-agent reinforcement learning, and 2) its application to interpret LLMs, explaining phenomena such as distributional semantics as a natural consequence of representation reconstruction. This work provides a unified theory that bridges individual cognitive development, collective language evolution, and the foundations of large-scale AI.