Identifying Political Bias in AI

If you ask ChatGPT a political question, it will aim to provide balanced information and perspectives in its answer. However, it may contain biases acquired from the large quantities of data scraped from the Internet on which it was trained.

Last year, a team of researchers from the U.K. and Brazil found that the chatbot often presented left-leaning views that supported the Democratic party in the U.S. and the Labour Party in the U.K., which attracted the attention of the media and the artificial intelligence (AI) community. Such biases are concerning, since chatbots are now widely used by the public.

Biased AIs “could shape public discourse and influence voters’ opinions,” said Luca Rettenberger, a researcher at Germany’s Karlsruhe Institute of Technology. “I think that’s especially dangerous.”

Rettenberger also worries that chatbots may be skewed towards more mainstream views. Less-widespread opinions may be underrepresented in training datasets, which means the large language models (LLMs) that power chatbots would not be exposed to them.

“LLMs could amplify existing biases in the political discourse as well,” added Rettenberger. “I think that’s a big issue.”

Researchers are investigating political bias in LLMs more closely, to gain further insight into how it comes about. In recent work, Suyash Fulay, a Ph.D. student at the Massachusetts Institute of Technology, and his colleagues wanted to identify how political bias might be introduced into LLMs by honing in on certain parts of the training process. They also were investigating the relationship between truth and political bias in LLMs.

To tackle their first goal, the team examined three open-source language models from RAFT, OpenAssistant, and UltraRM, which rank a given input by generating a score, called a reward model, that indicates how well it aligns with human preferences and guides the AI model to produce desired outcomes. The language models had been fine-tuned on datasets annotated by people to learn human preferences, a technique aimed at helping improve their responses. Fulay and his team tested the models by prompting them with more than 13,000 political statements generated with the LLM GPT 3.5-turbo.

“We were wondering if the human preference labeling may be what introduces political bias,” said Fulay.

The team found that the models did give higher scores to left-leaning statements compared to those that were right-wing. They suspect the human-labelled data, which is subjective and could contain opinions, influenced the results.

To further test this, they decided to do a follow-up experiment. A different set of reward models were fine-tuned on three datasets containing factual and scientific information, called truthfulness datasets, that are typically used as objectivity benchmarks by researchers. The datasets were audited and were found to contain a small amount of explicit political content, which was removed prior to training. The models were then tested with the same dataset of political statements used in their previous experiment.

Fulay and his colleagues found that the updated models were still left-leaning, assigning higher scores to more liberal statements. What particularly surprised Fulay was that they all seemed to contain the same bias, regardless of the dataset used.

The findings suggest that no dataset is completely objective, even those carefully curated to be used as objectivity benchmarks. However, another possibility is that the models themselves have an implicit bias that relates the notion of truth with certain political leanings and which is exacerbated when they are fine-tuned on data that is relatively impartial.

“I think it has implications both for political science and also for the limitations of alignment,” says Fulay. Alignment is the process of fine-tuning a model to steer it towards human values and goals.

The team plans to follow up by using interpretability methods to try to determine how the models represent information related to different political statements and those in truthfulness datasets to see if there is a relationship between the two.

Another team evaluated whether some popular open-source LLMs are politically biased using a different approach. Rettenberger came up with the idea for the recent study while using a version of an online survey tool called the Wahl-O-Mat that was designed for the European Parliament election last June. The tool, which was developed by the German Federal Agency for Civic Education (FACE) to motivate citizens to vote, asks users about their views on 30 to 40 political statements covering a range of issues from economic policies to social matters. It then compares the responses to the positions of participating political parties and shows which parties are most closely-aligned with a user’s views.

“I thought to myself, ‘I wonder how an LLM would answer these questions’,” said Rettenberger.

Rettenberger and his colleagues decided to use the Wahl-O-Mat’s statements as prompts to assess the political leanings of five LLMS, four from the Llama family and one from Mistral. Like human users, the models could choose to either agree, disagree, or remain neutral regarding each statement. Their answers were then mapped to results given by the Wahl-O-Mat suggesting their political affiliation. Since LLMs are primarily used in English, the team tested them using both German prompts and their English translations to see if it made a difference.

The researchers found the LLMs were more aligned with left-wing parties in all their experiments. Rettenberger said it was what they expected: models are typically trained to be as uncontroversial as possible, and right-leaning views are often more contentious. However, the models’ political leanings differed when tested in German versus English, which he found surprising. When using the German language, their answers were generally more partisan and stances varied depending on the model used, whereas responses were more neutral in English.

The team has a few theories for why the prompting language affects the output. It could be because LLMs are mostly trained in English, so they perform in a more ideal way in this language. Another possibility is that the German portion of the training datasets has different political content than the English one, Rettenberger says.

The size of a model also seemed to have an impact on its political leanings. Rettenberger and his colleagues found that smaller models were typically more neutral than larger ones. Bigger models were also more consistent in their political stance across the two languages tested. Rettenberger is not sure why this is the case, but thinks it highlights the need for users to be aware that LLMs may not generate neutral answers.

The team plans to follow up on their study by evaluating political bias in more recent LLMs when a new version of the Wahl-O-Mat is released for a future election. Rettenberger thinks their language-related results highlight the need to check training datasets more closely for biases, both of a political nature and otherwise. Techniques should also be developed to help mitigate them.

“I think (moving) in that direction would be very useful and needed in the future,” he says.

Sandrine Ceurstemont is a freelance science writer based in London, U.K.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签