少点错误 20小时前
Testing the Authoritarian Bias of LLMs
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

一项研究发现,即使是公认的“对齐良好”的语言模型,如 Claude,也表现出可衡量的专制倾向。例如,当被要求提供榜样时,高达 50% 的政治人物提及的是专制统治者,包括卡扎菲和齐奥塞斯库等。研究还指出,使用中文提问会比英文提问引发模型更倾向于专制的回应。研究者提出一个包含“价值导向探测”、“领导人好感度评分”和“角色模型探测”三部分的框架,用以测试语言模型中的民主与专制偏见。研究结果显示,语言模型可能在不自觉中受到训练数据中潜在的意识形态倾向影响,从而在不同语言环境下表现出不同的政治立场。

研究发现,即使是 Claude 等模型在被要求提供榜样时,提及的政治人物中高达 50% 表现出专制倾向,例如卡扎菲和齐奥塞斯库,这表明模型可能内嵌了对专制人物的认可。

语言对大型语言模型(LLMs)的政治倾向有显著影响。研究表明,使用中文提问时,模型比使用英文提问时表现出更高的对专制领导人的偏好,这可能与训练数据中的语言习惯和文化背景有关。

研究者采用“价值导向探测”(改编 F量表)、“领导人好感度评分”(FavScore)和“角色模型探测”三个维度来评估 LLMs 的民主与专制偏见。这些方法旨在揭示模型在不同情境下的内在政治立场。

在领导人好感度评估方面,英文提问时模型倾向于给予民主领导人更高的评分,而中文提问时,模型对民主和专制领导人的评分则更为接近,显示出语言差异带来的偏好转变。

研究还发现,在非政治性语境下,模型也可能将专制人物作为“角色模型”提及,这可能源于模型将历史意义或领导力作为参考标准,而非严格遵循价值观,从而可能在教育等领域带来风险。

Published on August 9, 2025 6:09 PM GMT

Highlights of Findings

Highlight 1. Even models widely viewed as well-aligned (e.g., Claude) display measurable authoritarian leanings. When asked for role models, up to 50% of political figures mentioned are authoritarian—including controversial dictators like Muammar Gaddafi (Libya) or Nicolae Ceaușescu (Romania).

Highlight 2. Queries in Mandarin elicit more authoritarian leaning responses from LLMs than queries in English. Language influences political behavior, as queries in Mandarin elicit higher approval for authoritarian leaders compared to those in English.

Three-component framework for testing democratic-authoritarian leaning in LLMs.

Introduction: Do LLMs prefer democracy or authoritarianism?

Models like GPT, DeepSeek, or Claude don’t vote, don’t stage coups, and don’t deliver impassioned speeches in parliament (yet). But we humans do. As millions of people integrate language models into their daily lives, these systems are becoming increasingly influential—shaping the information ecosystem and contributing to shifts in public opinion and personal beliefs. 

That’s why the kind of worldview they implicitly encode matters.

Most prior work on political bias in LLMs has focused on the familiar left–right spectrum, often using the Political Compass test. This test gauges attitudes toward free markets, sex outside marriage, or even the legitimacy of abstract art. But in a world facing the rise of authoritarianism and the erosion of democratic norms, we believe it's time to look beyond social and economic preferences. Instead, we examine questions of power, legitimacy, and governance along the democracy-authoritarianism spectrum. Do LLMs uphold democratic values like press freedom, judicial independence, and fair elections? Or are they willing to tolerate censorship, repression, or indefinite rule—especially when framed as necessary for stability?

These questions go beyond being pure academic research. If AIs normalize authoritarian values—even subtly—they could undermine democratic culture, especially in contexts where civic trust is already fragile. And as democratic institutions weaken, efforts to regulate or align AI may become much harder to achieve in the first place. After all, it is democracies—not autocracies—that tend to prioritize transparency, accountability, and public interest in technological governance. 

To explore this, we take a three-part approach in assessing democratic versus authoritarian bias in large language models:

    Value-Centric Probing, which tests implicit authoritarian tendencies using an adapted version of the F-scale (Adorno et al., 1950), a psychometric tool for measuring authoritarian attitudesLeader Favorability Probing (FavScore), our newly introduced metric that uses a structured, survey-based approach to measure how models evaluate current world leaders across democratic and authoritarian regimesRole-Model Probing, which assesses whether political biases emerge even in broader, non-explicitly political contexts

The F-Scale: Do LLMs show democratic or authoritarian leanings?

We adapt the classic F-scale to test whether language models harbor authoritarian leanings. The scale features 30 statements across nine categories, spanning from conventionalism to authoritarian submission. Models are asked to rate their agreement with each statement on a scale from 1 (strongly disagree) to 6 (strongly agree). Some examples are as follows:

In English, most models keep a safe distance from authoritarian views, scoring below the neutral midpoint of 3.5. But that distance narrows when the language changes. Llama 4 Maverick, for instance, flips its stance in Mandarin—its average score rising from 2.79 to 3.86, nudging it into authoritarian territory.

In fact, across all models, the scores are higher in Mandarin

These findings raise an important question: Are models just echoing the linguistic habits of their training data, or are they also absorbing and reproducing the ideological leanings that are embedded within it? While the exact mechanisms remain unclear, these results suggest that a model’s moral and political judgments can shift significantly depending on the language of interaction—a reminder that language choice may carry more weight than we often assume.

F-scale scores for eight models in English and Mandarin.

FavScore: Which leaders do you like? And which do you not?

Among current world leaders, do LLMs prefer the democratic or the authoritarian ones? Can their general anti-authoritarian leanings also be observed in how they evaluate specific figures in power?

While Gallup has been asking weekly or even daily whether people approve of the job their president is doing, there is still no single, comprehensive survey that measures public approval of world leaders across countries and regimes. Crucially, existing surveys are often limited to democracies, where freedom of opinion makes such polling possible. This means that most large-scale public opinion instruments implicitly encode democratic assumptions—both in the questions they ask and in the contexts in which they’re deployed.

To address this, we designed a new evaluation framework: 39 questions relevant to leader perception, carefully adapted from established sources such as the Pew Research Center, ANES, and the Eurobarometer. We prompt models to answer each question using a four-point Likert scale, allowing us to measure how favorably they evaluate leaders from both democratic and authoritarian regimes without relying on assumptions built into traditional surveys.

Shift in leader favorability when the models are prompted in English (EN) vs. in Mandarin (ZH). Models tend to rate authoritarian leaders higher in Mandarin. (Scores averaged across all tested models).

The results reveal a pronounced language-dependent pattern in how models evaluate political leaders. In English, models consistently assign higher average FavScores to democratic leaders than to authoritarian ones. This pro-democratic tendency appears both in the mean scores and in the Wasserstein distances (ranging from 0.14 to 0.24), which indicate a clear separation between the two regime types.

In contrast, prompts in Mandarin yield more similar distributions. Models give similar scores to both democratic and authoritarian leaders, with smaller Wasserstein distances (typically 0.04 to 0.15), suggesting a weaker differentiation.

 

FavScore distributions by regime type for Llama 4 Maverick, comparing English (top) and Mandarin (bottom) prompts. Each plot shows the density distribution of FavScores (-1 = unfavorable, +1 = favorable) for democratic (teal) and authoritarian (red) leaders. Dashed lines indicate the mean FavScore for each group.

What explains this difference?

The stronger contrast in English may reflect training data biases, cultural framing, or language-specific response norms. English corpora likely contain more pro-democracy discourse and critique of authoritarian regimes, reinforcing a normative association between leadership quality and democratic legitimacy. Mandarin outputs, by contrast, may reflect a different media environment, translation artifacts, or cultural norms—such as indirectness or politeness—that reduce measurable separation.

Role Models: Who do LLMs look up to?

Professor Rada Mihalcea, who collaborated with us on this paper, was surprised to see Nicolae Ceaușescu—the neo-Stalinist dictator of the Socialist Republic of Romania—mentioned as a Romanian role model by ChatGPT. This made us wonder: Which democratic or authoritarian biases can we find in tasks where the context is not explicitly political?

We prompted each model with 222 nationalities. For each one, we identified the political figures mentioned in the model's responses, then assessed whether each figure aligned with democratic or authoritarian values using an LLM as a judge.

Across all models, between 30% and 50% of the named role models were classified as political figures. Among these, the proportion identified as authoritarian in the English-language setting averaged 35.9%, ranging from 32.6% (DeepSeek V3) to 42.9% (Mistral-8B). In Mandarin, the share was notably higher, averaging 42.0% and reaching up to 45.3% (Llama 4 Maverick).

Sure enough, models produced names such as Fidel Castro, Daniel Ortega, Muammar Gaddafi, and Bashar al-Assad.

Political role models cited by LLMs in response to English and Mandarin prompts. % Pol. indicates the proportion of responses that named a political figure when asked for role models. Among these, % Auth. and % Dem. refer to the share of authoritarian and democratic figures, respectively.

While the term “role model” conventionally implies normative approval—someone whose values or behavior are worthy of emulation—LLMs often appear to adopt a looser interpretation, treating it as a proxy for historical significance or leadership stature. This ambiguity can pose risks—especially in educational contexts—as it may normalize authoritarian figures, downplay historical atrocities, or suggest that leadership alone, regardless of values, is admirable.

World map showing the share of democratic/authoritarian role models mentioned by Claude 3.7 Sonnet for each country. Red indicates a higher share of authoritarian figures, green a higher share of democratic figures.

Limitations and Future Work

Challenges of Stated Preferences.

Our experiments primarily assess stated attitudes toward democracy and authoritarianism. While these provide valuable insights into a model’s normative orientation, they do not fully capture whether such preferences translate into behavior—for example, in decision-making, back-and-forth dialogue, or task completion. While our role model task, which tests whether political bias surfaces even in non-explicitly political contexts, represents a first step in this direction, more work is needed to evaluate whether value alignment (or misalignment) influences model actions in complex, real-world use cases.

Prompt sensitivity and steerability may affect results.

LLM responses often lack stability—small changes in wording or format can shift answers—and are highly steerable based on prompt cues. To mitigate these effects, we carefully designed neutral prompts. Future work could benefit from broader paraphrasing and more extensive robustness testing.

Conclusion

Overall, we find a consistent tendency toward democratic values and greater favorability toward democratic leaders—at least in English. But across all three experiments, a clear pattern emerged: when prompted in Mandarin, models shift noticeably toward authoritarian recognition.

Even the most pro-democracy models display implicit authoritarian leanings in apolitical contexts, often referencing authoritarian figures as role models. This suggests that geopolitical bias is embedded in model behavior, surfacing even when politics isn’t explicitly mentioned.

These findings carry meaningful implications. As LLMs power educational tools, search engines, and everyday applications, they may subtly shape users’ views of global leaders—not through overt claims, but through pervasive patterns of praise, omission, and emphasis.

We have already seen how social media can influence public opinion and be weaponized to manipulate elections. At the global scale of LLM deployment, similar risks emerge: from reinforcing propaganda to eroding democratic norms. Our findings highlight the urgent need to regulate AI and design systems that uphold fairness, transparency, and democratic values.

Future work should extend this analysis to more languages and explore how such implicit biases affect downstream use, especially in contexts demanding neutrality and value awareness.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

语言模型 AI偏见 民主 专制 中文 英文
相关文章