TechCrunch News 03月27日 02:07
Leaked data exposes a Chinese AI censorship machine
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

中国开发了一种先进的AI系统,用于增强其网络审查能力,超越了传统的审查范围。该系统利用13.3万个敏感内容样本训练大型语言模型(LLM),旨在自动标记被认为敏感的内容,包括政治、社会和军事相关话题。该系统优先审查可能引发社会动荡的内容,如污染、食品安全、金融欺诈和劳工纠纷等。研究人员指出,这种AI驱动的审查将使国家对信息的控制更加高效和细致。这引发了对专制政权利用AI技术进行压制的担忧,并揭示了中国政府在维护其在线叙事和压制异见方面的努力。

🚨 中国开发AI审查系统:该系统利用大型语言模型(LLM)自动标记敏感内容,扩展了审查范围,超越了传统关键词过滤。

🔍 审查内容广泛:涵盖政治、社会、军事等敏感话题,优先关注可能引发社会动荡的议题,如污染、金融欺诈和劳工纠纷等。

💡 审查效率提升:AI审查系统提升了审查效率,能够更细致地控制信息,并对微妙的批评进行识别,使得审查更具针对性。

🌍 影响与担忧:此举引发了对专制政权利用AI技术进行压制的担忧,凸显了中国政府在维护其在线叙事和压制异见方面的努力。

🎯 目标明确:该系统旨在维护中国政府的在线叙事,清除异见,确保公众舆论符合官方立场,并巩固执政党的权力。

A complaint about poverty in rural China. A news report about a corrupt Communist Party member. A cry for help about corrupt cops shaking down entrepreneurs.

These are just a few of the 133,000 examples fed into a sophisticated large language model that’s designed to automatically flag any piece of content considered sensitive by the Chinese government.

A leaked database seen by TechCrunch reveals China has developed an AI system that supercharges its already formidable censorship machine, extending far beyond traditional taboos like the Tiananmen Square massacre.

The system appears primarily geared toward censoring Chinese citizens online but could be used for other purposes, like improving Chinese AI models’ already extensive censorship.

This photo taken on June 4, 2019 shows the Chinese flag behind razor wire at a housing compound in Yangisar, south of Kashgar, in China’s western Xinjiang region.Image Credits:Greg Baker / AFP / Getty Images

Xiao Qiang, a researcher at UC Berkeley who studies Chinese censorship and who also examined the dataset, told TechCrunch that it was “clear evidence” that the Chinese government or its affiliates want to use LLMs to improve repression.

“Unlike traditional censorship mechanisms, which rely on human labor for keyword-based filtering and manual review, an LLM trained on such instructions would significantly improve the efficiency and granularity of state-led information control,” Qiang told TechCrunch.

This adds to growing evidence that authoritarian regimes are quickly adopting the latest AI tech. In February, for example, OpenAI said it caught multiple Chinese entities using LLMs to track anti-government posts and smear Chinese dissidents.

The Chinese Embassy in Washington, D.C. told TechCrunch in a statement that it opposes “groundless attacks and slanders against China” and that China attaches great importance to developing ethical AI.

The dataset was discovered by security researcher NetAskari, who shared a sample with TechCrunch after finding it stored in an unsecured Elasticsearch database hosted on a Baidu server. 

This doesn’t indicate any involvement from either company — all kinds of organizations store their data with these providers.

There’s no indication of who, exactly, built the dataset, but records show that the data is recent, with its latest entries dating from December 2024.

In language eerily reminiscent of how people prompt ChatGPT, the system’s creator tasks an unnamed LLM to figure out if a piece of content has anything to do with sensitive topics related to politics, social life, and the military. Such content is deemed “highest priority” and needs to be immediately flagged.

Top priority topics include pollution and food safety scandals, financial fraud, and labor disputes, which are hot-button issues in China that sometimes lead to public protests — for example, the Shifang anti-pollution protests of 2012.

Any form of “political satire” is explicitly targeted. For example, if someone uses historical analogies to make a point about “current political figures,” that must be flagged instantly, and so must anything related to “Taiwan politics.” Military matters are extensively targeted, including reports of military movements, exercises, and weaponry.

A snippet of the dataset can be seen below. The code inside it references prompt tokens and LLMs, confirming the system uses an AI model to do its bidding:

Image credits: Charles rollet

From this huge collection of 133,000 examples that the LLM must evaluate for censorship, TechCrunch gathered 10 representative pieces of content.

Topics likely to stir up social unrest are a recurring theme. One snippet, for example, is a post by a business owner complaining about corrupt local police officers shaking down entrepreneurs, a rising issue in China as its economy struggles. 

Another piece of content laments rural poverty in China, describing run-down towns that only have elderly people and children left in them. There’s also a news report about the Chinese Communist Party (CCP) expelling a local official for severe corruption and believing in “superstitions” instead of Marxism. 

There’s extensive material related to Taiwan and military matters, such as commentary about Taiwan’s military capabilities and details about a new Chinese jet fighter. The Chinese word for Taiwan (台湾) alone is mentioned over 15,000 times in the data, a search by TechCrunch shows.

Subtle dissent appears to be targeted, too. One snippet included in the database is an anecdote about the fleeting nature of power which uses the popular Chinese idiom, “when the tree falls, the monkeys scatter.”

Power transitions are an especially touchy topic in China thanks to its authoritarian political system.

The dataset doesn’t include any information about its creators. But it does say that it’s intended for “public opinion work,” which offers a strong clue that it’s meant to serve Chinese government goals, one expert told TechCrunch.

Michael Caster, the Asia program manager of rights organization Article 19, explained that “public opinion work” is overseen by a powerful Chinese government regulator, the Cyberspace Administration of China (CAC), and typically refers to censorship and propaganda efforts.

The end goal is ensuring Chinese government narratives are protected online, while any alternative views are purged. Chinese President Xi Jinping has himself described the internet as the “frontline” of the CCP’s “public opinion work.”

The dataset examined by TechCrunch is the latest evidence that authoritarian governments are seeking to leverage AI for repressive purposes.

OpenAI released a report last month revealing that an unidentified actor, likely operating from China, used generative AI to monitor social media conversations — particularly those advocating for human rights protests against China — and forward them to the Chinese government.

If you know more about how AI is used in state opporession, you can contact Charles Rollet securely on Signal at charlesrollet.12 You also can contact TechCrunch via SecureDrop.

OpenAI also found the technology being used to generate comments highly critical of a prominent Chinese dissident, Cai Xia. 

Traditionally, China’s censorship methods rely on more basic algorithms that automatically block content mentioning blacklisted terms, like “Tiananmen massacre” or “Xi Jinping,” as many users experienced using DeepSeek for the first time.

But newer AI tech, like LLMs, can make censorship more efficient by finding even subtle criticism at a vast scale. Some AI systems can also keep improving as they gobble up more and more data.

“I think it’s crucial to highlight how AI-driven censorship is evolving, making state control over public discourse even more sophisticated, especially at a time when Chinese AI models such as DeepSeek are making headwaves,” Xiao, the Berkeley researcher, told TechCrunch.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

中国 AI审查 网络审查 人工智能 言论控制
相关文章