MIT News - Artificial intelligence 2024年08月05日
MIT researchers advance automated interpretability in AI models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

麻省理工学院计算机科学与人工智能实验室 (CSAIL) 的研究人员开发了一个名为“MAIA”的系统,该系统可以自动解释人工智能模型,特别是用于评估图像不同属性的计算机视觉模型。MAIA 通过结合预训练的视觉-语言模型和一系列可解释性工具,能够自动执行各种神经网络可解释性任务,例如识别模型中的特定组件、清理图像分类器并发现隐藏的偏差。

🤖 MAIA 能够自动执行各种神经网络可解释性任务,例如识别模型中的特定组件、清理图像分类器并发现隐藏的偏差。

🧠 MAIA 可以像人类研究人员一样,通过生成假设、设计实验来测试这些假设以及通过迭代分析来完善其理解,来进行可解释性实验。

🔍 MAIA 通过分析神经元激活的图像并生成和编辑合成图像来测试其假设,从而确定神经元的活动原因,例如识别特定神经元负责检测哪些视觉概念。

⚖️ MAIA 可以帮助发现人工智能系统中的隐藏偏差,例如发现图像分类器对特定图像子类别存在偏差,例如识别黑拉布拉多犬的图像更容易被错误分类。

🚀 MAIA 的开发是朝着更具弹性的 AI 生态系统迈出的重要一步,在这个生态系统中,用于理解和监控 AI 系统的工具与系统扩展同步,使我们能够调查并希望理解新模型带来的意想不到的挑战。

As artificial intelligence models become increasingly prevalent and are integrated into diverse sectors like health care, finance, education, transportation, and entertainment, understanding how they work under the hood is critical. Interpreting the mechanisms underlying AI models enables us to audit them for safety and biases, with the potential to deepen our understanding of the science behind intelligence itself.

Imagine if we could directly investigate the human brain by manipulating each of its individual neurons to examine their roles in perceiving a particular object. While such an experiment would be prohibitively invasive in the human brain, it is more feasible in another type of neural network: one that is artificial. However, somewhat similar to the human brain, artificial models containing millions of neurons are too large and complex to study by hand, making interpretability at scale a very challenging task. 

To address this, MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers decided to take an automated approach to interpreting artificial vision models that evaluate different properties of images. They developed “MAIA” (Multimodal Automated Interpretability Agent), a system that automates a variety of neural network interpretability tasks using a vision-language model backbone equipped with tools for experimenting on other AI systems.

“Our goal is to create an AI researcher that can conduct interpretability experiments autonomously. Existing automated interpretability methods merely label or visualize data in a one-shot process. On the other hand, MAIA can generate hypotheses, design experiments to test them, and refine its understanding through iterative analysis,” says Tamar Rott Shaham, an MIT electrical engineering and computer science (EECS) postdoc at CSAIL and co-author on a new paper about the research. “By combining a pre-trained vision-language model with a library of interpretability tools, our multimodal method can respond to user queries by composing and running targeted experiments on specific models, continuously refining its approach until it can provide a comprehensive answer.”

The automated agent is demonstrated to tackle three key tasks: It labels individual components inside vision models and describes the visual concepts that activate them, it cleans up image classifiers by removing irrelevant features to make them more robust to new situations, and it hunts for hidden biases in AI systems to help uncover potential fairness issues in their outputs. “But a key advantage of a system like MAIA is its flexibility,” says Sarah Schwettmann PhD ’21, a research scientist at CSAIL and co-lead of the research. “We demonstrated MAIA’s usefulness on a few specific tasks, but given that the system is built from a foundation model with broad reasoning capabilities, it can answer many different types of interpretability queries from users, and design experiments on the fly to investigate them.” 

Neuron by neuron

In one example task, a human user asks MAIA to describe the concepts that a particular neuron inside a vision model is responsible for detecting. To investigate this question, MAIA first uses a tool that retrieves “dataset exemplars” from the ImageNet dataset, which maximally activate the neuron. For this example neuron, those images show people in formal attire, and closeups of their chins and necks. MAIA makes various hypotheses for what drives the neuron’s activity: facial expressions, chins, or neckties. MAIA then uses its tools to design experiments to test each hypothesis individually by generating and editing synthetic images — in one experiment, adding a bow tie to an image of a human face increases the neuron’s response. “This approach allows us to determine the specific cause of the neuron’s activity, much like a real scientific experiment,” says Rott Shaham.

MAIA’s explanations of neuron behaviors are evaluated in two key ways. First, synthetic systems with known ground-truth behaviors are used to assess the accuracy of MAIA’s interpretations. Second, for “real” neurons inside trained AI systems with no ground-truth descriptions, the authors design a new automated evaluation protocol that measures how well MAIA’s descriptions predict neuron behavior on unseen data.

The CSAIL-led method outperformed baseline methods describing individual neurons in a variety of vision models such as ResNet, CLIP, and the vision transformer DINO. MAIA also performed well on the new dataset of synthetic neurons with known ground-truth descriptions. For both the real and synthetic systems, the descriptions were often on par with descriptions written by human experts.

How are descriptions of AI system components, like individual neurons, useful? “Understanding and localizing behaviors inside large AI systems is a key part of auditing these systems for safety before they’re deployed — in some of our experiments, we show how MAIA can be used to find neurons with unwanted behaviors and remove these behaviors from a model,” says Schwettmann. “We’re building toward a more resilient AI ecosystem where tools for understanding and monitoring AI systems keep pace with system scaling, enabling us to investigate and hopefully understand unforeseen challenges introduced by new models.”

Peeking inside neural networks

The nascent field of interpretability is maturing into a distinct research area alongside the rise of “black box” machine learning models. How can researchers crack open these models and understand how they work?

Current methods for peeking inside tend to be limited either in scale or in the precision of the explanations they can produce. Moreover, existing methods tend to fit a particular model and a specific task. This caused the researchers to ask: How can we build a generic system to help users answer interpretability questions about AI models while combining the flexibility of human experimentation with the scalability of automated techniques?

One critical area they wanted this system to address was bias. To determine whether image classifiers displayed bias against particular subcategories of images, the team looked at the final layer of the classification stream (in a system designed to sort or label items, much like a machine that identifies whether a photo is of a dog, cat, or bird) and the probability scores of input images (confidence levels that the machine assigns to its guesses). To understand potential biases in image classification, MAIA was asked to find a subset of images in specific classes (for example “labrador retriever”) that were likely to be incorrectly labeled by the system. In this example, MAIA found that images of black labradors were likely to be misclassified, suggesting a bias in the model toward yellow-furred retrievers.

Since MAIA relies on external tools to design experiments, its performance is limited by the quality of those tools. But, as the quality of tools like image synthesis models improve, so will MAIA. MAIA also shows confirmation bias at times, where it sometimes incorrectly confirms its initial hypothesis. To mitigate this, the researchers built an image-to-text tool, which uses a different instance of the language model to summarize experimental results. Another failure mode is overfitting to a particular experiment, where the model sometimes makes premature conclusions based on minimal evidence.

“I think a natural next step for our lab is to move beyond artificial systems and apply similar experiments to human perception,” says Rott Shaham. “Testing this has traditionally required manually designing and testing stimuli, which is labor-intensive. With our agent, we can scale up this process, designing and testing numerous stimuli simultaneously. This might also allow us to compare human visual perception with artificial systems.”

“Understanding neural networks is difficult for humans because they have hundreds of thousands of neurons, each with complex behavior patterns. MAIA helps to bridge this by developing AI agents that can automatically analyze these neurons and report distilled findings back to humans in a digestible way,” says Jacob Steinhardt, assistant professor at the University of California at Berkeley, who wasn’t involved in the research. “Scaling these methods up could be one of the most important routes to understanding and safely overseeing AI systems.”

Rott Shaham and Schwettmann are joined by five fellow CSAIL affiliates on the paper: undergraduate student Franklin Wang; incoming MIT student Achyuta Rajaram; EECS PhD student Evan Hernandez SM ’22; and EECS professors Jacob Andreas and Antonio Torralba. Their work was supported, in part, by the MIT-IBM Watson AI Lab, Open Philanthropy, Hyundai Motor Co., the Army Research Laboratory, Intel, the National Science Foundation, the Zuckerman STEM Leadership Program, and the Viterbi Fellowship. The researchers’ findings will be presented at the International Conference on Machine Learning this week.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 可解释性 神经网络 计算机视觉 偏差
相关文章