Published on June 25, 2025 6:18 PM GMT
The reality, risks, & consequences of scaling consumer-entertainment technologies to a global population with little cognitive security. [mirrored from Substack]
Western, American culture will be in frontier language models’ training data.
Most other cultures will not.
If nobody steps in, ~90% of mankind's culture could die quite soon.
Globally, indigenous cultures face an unprecedented threat to their traditional heritage.
The Marubo, a 2,000-member tribe in the Brazilian Amazon, recently made headlines: they received Starlink in September 2023. Within 9 months, their youth stopped learning traditional body paint and jewelry making, young men began sharing pornographic videos in group chats (in a culture that frowns on public kissing), leaders observed "more aggressive sexual behavior," and children became addicted to short-form video content. One leader reported: "Everyone is so connected that sometimes they don't even talk to their own family."
Marubo people carrying a Starlink antenna stopped for a break to eat papaya.
Cultural diversity is already experiencing a mass extinction at rates comparable to our biological, Holocene extinction ֊֊ up to 10,000 times the natural background rate. Yet unlike species loss, which triggers conservation efforts, cultural extinction largely remains unnoticed.
Approximately 50% of the world's population was not online 10 years ago. In 2010, only 30% of the global population used the internet, while by 2020, that number had doubled to 60%.
The mechanism is straightforward: providing internet access to previously offline populations overwhelms users with enticing, addicting, and viral stimuli. The entertainment is far greater than anything indigenous, pre-digital lifestyles can offer.
This cultural erosion has profound implications beyond the communities themselves: the internet that's replacing indigenous knowledge systems is also feeding the artificial intelligence that will shape humanity's future. Large language models learn from text scraped across the web ֊֊֊ and if a culture doesn't exist online, it effectively doesn't exist to AI. Every traditional practice abandoned for TikTok, every oral tradition left unrecorded, every indigenous language that goes unwritten represents not just a loss to those communities, but a permanent gap in the knowledge systems that will guide our AI-mediated world. The internet homogenizes human culture toward whatever content generates the most engagement, and AI systems trained on that same internet amplify those patterns further. The Marubo children who stopped learning traditional body painting aren't just losing their heritage ֊֊֊ they're ensuring that AI systems will never learn to think in ways their ancestors did. What survives digitization becomes the foundation for machine intelligence; what doesn't survive gets erased from our computational future.
A North Korean soldier of the Ukrainian War (2025) lies in his barracks, scrolling.
The mechanism driving this convergence has deep historical precedent, and is an unsurprising consequence of how power operates in the digital age. The policing power of any central cultural authority weakens with distance ֊֊֊ a Chinese proverb, from the Yuan dynasty, applies: "Heaven is high and the emperor is far away.” Historically, in times of totalitarian oppression, such a reality enabled local customs to persist. In the post-industrial age, this protection no longer exists. The proverbial emperor is no longer a despot, requiring legislative & coercive mechanisms to homogenize culture. The emperor has been supplanted by market forces and economic competition, which carry out this responsibility instead.
Mature capitalist economies, such as the United States, necessitate firms pursue global user acquisition for survival ֊֊֊ firms must constantly expand their customer base to satisfy growth expectations and outcompete adversaries. This imperative drives firms to pursue a global audience, while delivering widely-appealing products which transcend cultural boundaries. Through algorithmically-delivered content, engineers develop low-friction behavioral pathways which reshape cultural habits, social practices, and leisure activities towards addiction.
Cultural diversity, much the same as biological diversity, preserves an irreplaceably robust repository of solutions to ancient problems. Darwinian evolution has operated as a massively parallel search algorithm in for hundreds of millions of years, testing chemical and behavioral solutions to ecological problems. Claude Lévi-Strauss, renowned anthropologist, believes culture to be "a unique experiment in organizing human life" ֊֊֊ encoding generations of knowledge systems adapted to localized environments. Much the same as biodiversity, such cultural solutions cannot be easily discovered, let alone, recreated once lost.
This embodied wisdom, refined through generations of trial and error, often reveals natural phenomena that laboratory science would never think to investigate. For every cultural extinction, we lose not just traditions, but entire systems of observation and experimentation that could unlock medical and scientific breakthroughs.
- Cytarabine, a leukemia drug which grants remission to thousands of children yearly, came from the Caribbean sponge Tectitethya crypta ֊֊֊ an organism coastal peoples knew but Western science discovered by accident.Artemisinin, a sweet, wormwood treatment for fevers, was first used in traditional Chinese medicine. It is now the WHO's first-line malaria treatment, saving 500,000 lives annually. Its discovery saved tens of millions of lives and earned Tu Youyou the 2015 Nobel Prize. The market value of this drug is $670+ million annually.Indigenous soil-pitting techniques reversed desertification in the Sahel, increasing crop yields by 500% in degraded land. This method is now relied upon in efforts from the WTO as a low-cost climate adaptation strategy.Maori honey producers' use of manuka led to a $400 million global industry after scientists confirmed its antimicrobial properties: methylglyoxal levels 100x higher than regular honey.
Tectitethya crypta
Recently, I had the privilege of speaking with Ayah Bdeir, a Lebanese entrepreneur and activist currently leading AI strategy at The Mozilla Foundation. She is reducing her Mozilla responsibilities to pursue what she sees as an existential challenge: economizing large-scale digitization of Arab cultural media, knowledge, and lifestyles. Her rationale was straightforward: "The Arab world is not in the training data." She explained how this absence manifests at every level - from the technical (Arabic OCR remains largely unresolved, making centuries of texts inaccessible) to the geopolitical (The Gulf States pouring billions into sovereign AI). Without intervention, she argued, AI will learn to think exclusively through a Western lens, while a fifth of humanity's historical frameworks vanish from our computational future.
Similarly, Rohan Pandey, a machine learning scientist, recently departed from OpenAI to address Sanskrit OCR, recognizing that millennia of Vedic texts ֊֊ containing alternative philosophical frameworks, governance models, and problem-solving methodologies ֊֊ exist neither in digital format nor in current training corpora. These initiatives are important beyond preservation efforts; they constitute strategic interventions to incorporate cognitive diversity into artificial intelligence systems before technological monoculture becomes irreversible.
See, large language models are prediction engines. LLMs learn implicit value functions from their training data. Cultural homogenization is as epistemological as it is behavioral: when AI systems train exclusively on internet-scale English text, they inherit a biased ontology. Claude Opus models “believe” problems have solutions, progress is linear, optimization is inherently good. The majority of human knowledge was developed under different assumptions. (For example, Islamic jurisprudence uses analogical reasoning, Qiyas, which treats precedent as living wisdom rather than fixed law). Now, as far as we are aware, language models cannot hold conscious “beliefs”, per se, but they do hold statistical patterns which function as such. In the training process, language models operate like anthropologists in reverse ֊֊ they must reconstruct entire worldviews from textual fragments. When indigenous cultural frameworks have no representation in the training data, models literally cannot develop the neural infrastructure to reconstruct those ways of thinking.
Including Vedic literature in the training data (or Arabic philosophy, or indigenous texts) fundamentally changes what the model treats as valid reasoning. The Bhagavad Gita presents duty-based ethics where Krishna argues for necessary violence; the Arthashastra details statecraft through deception; Tantric texts describe transformation through transgression. When repositories of cultures developed from these ideas comprise significant training data, language models develop different priors about acceptable solutions. It might propose strategies that Western-trained models would never generate ֊֊ not because they're "wrong" but because they violate implicit constraints learned from a narrow philosophical canon.
We're watching the construction of a new priesthood - machine learning engineers who shape how future systems will think - while the training data that could provide alternate reasoning patterns remains unlabeled, untranslated, or simply absent. Pandey's work on Sanskrit OCR isn't about preservation; it's about preventing an intellectual monoculture where every AI system inherits the same blind spots. The Marubo didn't just lose their customs to Starlink ֊֊֊ they lost their children's ability to think in ways that modernity cannot replicate. As one Marubo leader stated: "We can't live without the internet." The dependency crystallizes before understanding of the danger arrives.
We're writing a self-fulfilling prophecy: by training AI only on the worldviews that survived digitization, we ensure those become the only worldviews that can exist in our AI-mediated future. The models will complete our civilizational story the only way they know how - by extrapolating from the fragments we've given it. In doing so, we're eliminating entire branches of human problem-solving that took millennia to develop and might be impossible to rediscover.
Thank you Dunya Baradari, Cassandra Melax, & Izzy for helpful insights & review.
Discuss