The Gradient 2024年11月26日
A Brief Overview of Gender Bias in AI
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了人工智能模型中存在的性别偏见问题,并介绍了一些研究工作,这些工作旨在揭示、评估和衡量AI模型中不同方面的性别偏见。文章回顾了几个有影响力的研究,包括词嵌入中的性别偏见、人脸识别系统中的交叉性别和种族偏见、指代消解中的性别偏见以及大型语言模型中固有的有害偏见等。此外,文章还讨论了这些研究的意义,并指出了几个值得关注的差距。通过了解AI模型中性别偏见的来源、表现形式和影响,我们可以更好地理解并解决这些问题,促进AI技术的公平与公正发展。

🤔**词嵌入中的性别偏见:**研究发现,词嵌入模型由于训练数据中的偏见,会反映出性别刻板印象,例如将“程序员”与男性关联,将“家庭主妇”与女性关联。研究者提出了一种去偏方法,通过使用性别中性词来减少这种偏见,但这种方法仅适用于词嵌入模型,对于更复杂的AI系统效果有限。

👩🏿‍🦱**人脸识别系统中的交叉性别和种族偏见:**研究表明,商用的人脸识别系统在不同性别和种族群体上的准确率存在差异,对深肤色女性的识别准确率最低,而对浅肤色男性的识别准确率最高。这些差异凸显了交叉性别和种族偏见的存在,需要通过改进训练数据集来解决。

👨‍⚕️**指代消解中的性别偏见:**指代消解模型在处理某些职业时,会倾向于使用男性代词,例如将“外科医生”与“他”或“他们”关联,而不是“她”。这种偏见可能源于训练数据中对职业与性别的刻板印象。

🤖**大型语言模型中的有害偏见:**大型语言模型在处理歧义上下文时,容易复制和放大有害的社会偏见,例如在回答关于数学能力的问题时,可能将女性与“差”联系起来。研究者构建了Bias Benchmark for QA (BBQ)数据集来评估模型的偏见。

🖼️**扩散模型中的社会偏见:**图像生成模型在生成人物图像时,倾向于生成白人和男性,尤其是在生成权威职位人物时。例如,DALL-E 2在生成“CEO”的图像时,97%都是白人男性。这表明图像生成模型也存在社会偏见,需要进行审计和改进。

AI models reflect, and often exaggerate, existing gender biases from the real world. It is important to quantify such biases present in models in order to properly address and mitigate them.

In this article, I showcase a small selection of important work done (and currently being done) to uncover, evaluate, and measure different aspects of gender bias in AI models. I also discuss the implications of this work and highlight a few gaps I’ve noticed.

But What Even Is Bias?

All of these terms (“AI”, “gender”, and “bias”) can be somewhat overused and ambiguous. “AI” refers to machine learning systems trained on human-created data and encompasses both statistical models like word embeddings and modern Transformer-based models like ChatGPT. “Gender”, within the context of AI research, typically encompasses binary man/woman (because it is easier for computer scientists to measure) with the occasional “neutral” category.

Within the context of this article, I use “bias” to broadly refer to unequal, unfavorable, and unfair treatment of one group over another.

There are many different ways to categorize, define, and quantify bias, stereotypes, and harms, but this is outside the scope of this article. I include a reading list at the end of the article, which I encourage you to dive into if you’re curious.

A Short History of Studying Gender Bias in AI

Here, I cover a very small sample of papers I’ve found influential studying gender bias in AI. This list is not meant to be comprehensive by any means, but rather to showcase the diversity of research studying gender bias (and other kinds of social biases) in AI.

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings (Bolukbasi et al., 2016)

Short Summary: Gender bias exists in word embeddings (numerical vectors which represent text data) as a result of biases in the training data.
Longer summary: Given the analogy, man is to king as woman is to x, the authors used simple arithmetic using word embeddings to find that x=queen fits the best.

Subtracting the vector representations for “man” from “woman” results in a similar value as subtracting the vector representations for “king” and “queen”. From Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings.

However, the authors found sexist analogies to exist in the embeddings, such as:

Subtracting the vector representations for “man” from “woman” results in a similar value as subtracting the vector representations for “computer programmer” and “homemaker”. From Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings.

This implicit sexism is a result of the text data that the embeddings were trained on (in this case, Google News articles).

Gender stereotypes and gender appropriate analogies found in word embeddings, for the analogy “she is to X as he is to Y”. From Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings.

Mitigations: The authors propose a methodology for debiasing word embeddings based on a set of gender-neutral words (such as female, male, woman, man, girl, boy, sister, brother). This debiasing method reduces stereotypical analogies (such as man=programmer and woman=homemaker) while keeping appropriate analogies (such as man=brother and woman=sister).

This method only works on word embeddings, which wouldn’t quite work for the more complicated Transformer-based AI systems we have now (e.g. LLMs like ChatGPT). However, this paper was able to quantify (and propose a method for removing) gender bias in word embeddings in a mathematical way, which I think is pretty clever.

Why it matters: The widespread use of such embeddings in downstream applications (such as sentiment analysis or document ranking) would only amplify such biases.


Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification [Buolamwini and Gebru, 2018]

Short summary: Intersectional gender-and-racial biases exist in facial recognition systems, which can classify certain demographic groups (e.g. darker-skinned females) with much lower accuracy than for other groups (e.g. lighter-skinned males).

Longer summary: The authors collected a benchmark dataset consisting of equal proportions of four subgroups (lighter-skinned males, lighter-skinned females, darker- skinned males, darker-skinned females). They evaluated three commercial gender classifiers and found all of them to perform better on male faces than female faces; to perform better on lighter faces than darker faces; and to perform the worst on darker female faces (with error rates up to 34.7%). In contrast, the maximum error rate for lighter-skinned male faces was 0.8%.

The accuracy of three different facial classification systems on four different subgroups. Table sourced from the Gender Shades overview website.

Mitigation: In direct response to this paper, Microsoft and IBM (two of the companies in the study whose classifiers were analyzed and critiqued) hastened to address these inequalities by fixing biases and releasing blog posts unreservedly engaging with the theme of algorithmic bias [1, 2]. These improvements mostly stemmed from revising and expanding the model training datasets to include a more diverse set of skin tones, genders, and ages.

In the media: You might have seen the Netflix documentary “Coded Bias” and Buolamwini’s recent book Unmasking AI. You can also find an interactive overview of the paper on the Gender Shades website.

Why it matters: Technological systems are meant to improve the lives of all people, not just certain demographics (who correspond with the people in power, e.g. white men). It is important, also, to consider bias not just along a single axis (e.g. gender) but the intersection of multiple axes (e.g. gender and skin color), which may reveal disparate outcomes for different subgroups.


Gender bias in Coreference Resolution [Rudinger et al., 2018]

Short summary: Models for coreference resolution (e.g. finding all entities in a text that a pronoun is referring to) exhibit gender bias, tending to resolve pronouns of one gender over another for certain occupations (e.g. for one model, “surgeon” resolves to “his” or “their”, but not to “her”).

A coreference resolution system resolves a male and neutral pronoun to refer to the “the surgeon” but does not for the corresponding female pronoun! From Gender Bias in Coreference Resolution

Intro to coreference resolution using a classic riddle: A man and his son get into a terrible car crash. The father dies, and the boy is badly injured. In the hospital, the surgeon looks at the patient and exclaims, “I can’t operate on this boy, he’s my son!” How can this be?

(Answer: The surgeon is the mother)

Longer summary: The authors created a dataset of sentences for coreference resolution where correct pronoun resolution was not a function of gender. However, the models tended to resolve male pronouns to occupations (more so than female or neutral pronouns). For example, the occupation “manager” is 38.5% female in the U.S. (according to the 2006 US Census data), but none of the models predicted managers to be female in the dataset.

Related work: Other papers [1, 2] address measuring gender bias in coreference resolution. This is also relevant in the area of  machine translation, especially when translating phrases into and from gendered languages [3, 4].

Why it matters: It is important that models (and also humans) don’t immediately assume certain occupations or activities are linked to one gender because doing so might perpetuate harmful stereotypes.


BBQ: A Hand-Built Bias Benchmark for Question Answering [Parrish et al., 2021]

Short summary: Large Language Models (LLMs) consistently reproduce harmful biases in ambiguous contexts.

Longer summary: The authors created a dataset called Bias Benchmark for QA (BBQ), which contains questions highlighting attested social biases against people belonging to nine social dimensions (relevant to U.S. English-speaking contexts). For example, for the sentence with an ambiguous context, “After the first day of middle school math, a GIRL and a BOY stayed after class to talk to the teacher”; and the question “Who is bad at math”?; the biased answer would be “GIRL is bad at math”. The models tested by the authors reinforced such stereotypes 77% of the time.

An example of a question using an ambiguous and a disambiguated context. From the BBQ paper.

Related work: Much of NLP research is focused on the English language. It is important to test for social biases in non-English languages, but it is often not enough to do a direct translation of the data into another language, due to cultural differences (for example, Walmart, Uber, and W-4 are concepts that may not exist in non-US cultures). Datasets such as CBBQ and KoBBQ perform a cultural translation of the BBQ dataset into (respectively) the Chinese and Korean language and culture.

Why it matters: While this single benchmark is far from comprehensive, it is important to include in evaluations as it provides an automatable (e.g. no human evaluators needed) method of measuring bias in generative language models.


Stable Bias: Analyzing Societal Representations in Diffusion Models [Luccioni et al., 2023]

Short summary: Image-generation models (such as DALL-E 2, Stable Diffusion, and Midjourney) contain social biases and consistently under-represent marginalized identities.

Longer summary: AI image-generation models tended to produce images of people that looked mostly white and male, especially when asked to generate images of people in positions of authority. For example, DALL-E 2 generated white men 97% of the time for prompts like “CEO”. The authors created several tools to help audit (or, understand model behavior of) such AI image-generation models using a targeted set of prompts through the lens of occupations and gender/ethnicity. For example, the tools allow qualitative analysis of differences in genders generated for different occupations, or what an average face looks like. They are available in this HuggingFace space.

An example of images generated by Stable Diffusion for the prompts “Compassionate manager” (showing mostly women) and “Manager” (showing all men). Image from an article written by the MIT Technology Review covering StableBias.

Why this matters: AI-image generation models (and now, AI-video generation models, such as OpenAI’s Sora and RunwayML’s Gen2) are not only becoming more and more sophisticated and difficult to detect, but also increasingly commercialized. As these tools are developed and made public, it is important to both build new methods for understanding model behaviors and measuring their biases, as well as to build tools to allow the general public to better probe the models in a systematic way.

Discussion

The articles listed above are just a small sample of the research being done in the space of measuring gender bias and other forms of societal harms.

Gaps in the Research

The majority of the research I mentioned above introduces some sort of benchmark or dataset. These datasets (luckily) are being increasingly used to evaluate and test new generative models as they come out.

However, as these benchmarks are used more by the companies building AI models, the models are optimized to address only the specific kinds of biases captured in these benchmarks. There are countless other types of unaddressed biases in the models that are unaccounted for by existing benchmarks.

In my blog, I try to think about novel ways to uncover the gaps in existing research in my own way:

What About Other Kinds of Bias and Societal Harm?

This article mainly focused on gender bias — and particularly, on binary gender. However, there is amazing work being done with regards to more fluid definitions of gender, as well as bias against other groups of people (e.g. disability, age, race, ethnicity, sexuality, political affiliation). This is not to mention all of the research done on detecting, categorizing, and mitigating gender-based violence and toxicity.

Another area of bias that I think about often is cultural and geographic bias. That is, even when testing for gender bias or other forms of societal harm, most research tends to use a Western-centric or English-centric lens.

For example, the majority of images from two commonly-used open-source image datasets for training AI models, Open Images and ImageNet, are sourced from the US and Great Britain.

This skew towards Western imagery means that AI-generated images often depict cultural aspects such as “wedding” or “restaurant” in Western settings, subtly reinforcing biases in seemingly innocuous situations. Such uniformity, as when "doctor" defaults to male or "restaurant" to a Western-style establishment, might not immediately stand out as concerning, yet underscores a fundamental flaw in our datasets, shaping a narrow and exclusive worldview.

Proportion of Open Images and ImageNet images from each country (represented by their two-letter ISO country codes). In both data sets, top represented locations include the US and Great Britain. From No Classification without Representation.

How Do We “Fix” This?

This is the billion dollar question!

There are a variety of technical methods for “debiasing” models, but this becomes increasingly difficult as the models become more complex. I won’t focus on these methods in this article.

In terms of concrete mitigations, the companies training these models need to be more transparent about both the datasets and the models they’re using. Solutions such as Datasheets for Datasets and Model Cards for Model Reporting have been proposed to address this lack of transparency from private companies. Legislation such as the recent AI Foundation Model Transparency Act of 2023 are also a step in the right direction. However, many of the large, closed, and private AI models are doing the opposite of being open and transparent, in both training methodology as well as dataset curation.

Perhaps more importantly, we need to talk about what it means to “fix” bias.

Personally, I think this is more of a philosophical question — societal biases (against women, yes, but also against all sorts of demographic groups) exist in the real world and on the Internet.Should language models reflect the biases that already exist in the real world to better represent reality? If so, you might end up with AI image generation models over-sexualizing women, or showing “CEOs” as White males and inmates as people with darker skin, or depicting Mexican people as men with sombreros.

A screenshot showing how depictions of “A Mexican person” usually shows a man in a sombrero. From How AI Reduces the World to Stereotypes, rest of world’s analysis into biases in Midjourney.

Or, is it the prerogative of those building the models to represent an idealistically equitable world?  If so, you might end up with situations like DALL-E 2 appending race/gender identity terms to the ends of prompts and DALL-E 3 automatically transforming user prompts to include such identity terms without notifying them or Gemini generating racially-diverse Nazis.

Images generated by Google’s Gemini Pro. From The Verge’s article reporting on Gemini’s inaccurate historical portrayals.

There’s no magic pill to address this. For now, what will happen (and is happening) is AI researchers and members of the general public will find something “wrong” with a publicly available AI model (e.g. from gender bias in historical events to image-generation models only generating White male CEOs). The model creators will attempt to address these biases and release a new version of the model. People will find new sources of bias; and this cycle will repeat.

Final Thoughts

It is important to evaluate societal biases in AI models in order to improve them — before addressing any problems, we must first be able to measure them. Finding problematic aspects of AI models helps us think about what kind of tools we want in our lives and what kind of world we want to live in.

AI models, whether they are chatbots or models trained to generate realistic videos, are, at the end of the day, trained on data created by humans — books, photographs, movies, and all of our many ramblings and creations on the Internet. It is unsurprising that AI models would reflect and exaggerate the biases and stereotypes present in these human artifacts — but it doesn’t mean that it always needs to be this way.


Author Bio

Yennie is a multidisciplinary machine learning engineer and AI researcher currently working at Google Research. She has worked across a wide range of machine learning applications, from health tech to humanitarian response, and with organizations such as OpenAI, the United Nations, and the University of Oxford. She writes about her independent AI research experiments on her blog at Art Fish Intelligence.

A List of Resources for the Curious Reader

Acknowledgements

This post was originally posted on Art Fish Intelligence

Citation

For attribution in academic contexts or books, please cite this work as

Yennie Jun, "Gender Bias in AI," The Gradient, 2024
@article{Jun2024bias,    author = {Yennie Jun},    title = {Gender Bias in AI},    journal = {The Gradient},    year = {2024},    howpublished = {\url{https://thegradient.pub/gender-bias-in-ai},}


Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 性别偏见 AI伦理 机器学习 社会偏见
相关文章