MarkTechPost@AI 2024年10月15日
Quanda: A New Python Toolkit for Standardized Evaluation and Benchmarking of Training Data Attribution (TDA) in Explainable AI
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Quanda是一个用于可解释AI中训练数据归因(TDA)的标准化评估和基准测试的Python工具包。它旨在解决TDA评估中的差距,提供了全面的评估指标和统一接口。该工具包具有用户友好、经过充分测试等特点,并可在PyPI中获取。它对TDA的评估全面,包含多个功能单元,有助于模型解释、调试等任务。

🧐Quanda是Fraunhofer Institute for Telecommunications推出的Python工具包,旨在解决TDA评估中的差距,提供全面的评估指标和统一接口,用户友好且经过充分测试,可在PyPI中获取。

📈Quanda整合了多种库,实现了与当前TDA实现的无缝集成。其TDA评估全面,包括多种方法的标准接口、多种任务的评估指标以及预计算的评估基准套件,确保了用户的可重复性和可靠性。

🔧Quanda库中有多个由模块化接口表示的功能单元,包括解释器、评估指标和基准。每个元素都作为基类实现,允许用户通过将其实现按照基本的可解释性模型进行包装来评估新的TDA方法。

🌟Quanda解决了TDA评估指标的不足,使TDA研究人员受益。未来,其功能有望扩展到更复杂的领域,如自然语言处理。

XAI, or Explainable AI, brings about a paradigm shift in neural networks that emphasizes the need to explain the decision-making processes of neural networks, which are well-known black boxes. In XAI, methods of feature selection, mechanistic interpretability, concept-based explainability, and training data attribution (TDA) have gained popularity. Today, we talk about TDA, which aims to relate a model’s inference from a specific sample to its training data. Apart from model explainability, it also helps with other vital tasks such as model debugging, data summarization, machine unlearning, dataset selection, fact tracing, etc. Research on TDA is thriving, but we see meager work in evaluating attributions. Several standalone metrics have been proposed to assess the quality of TDA across contexts; however, they do not provide a systematic and unified comparison that could gain the trust of the research community. This calls for a unified framework for TDA evaluation (and beyond).

The Fraunhofer Institute for Telecommunications has put forth Quanda to bridge this gap. It is a Python toolkit that provides a comprehensive set of evaluation metrics and a uniform interface for seamless integration with current TDA implementations. This is user-friendly, thoroughly tested, and available as a library in PyPI. Quanda incorporates PyTorch Lightning, HuggingFace Datasets, Torchvision, Torcheval, and Torchmetrics libraries for seamless integration into users’ pipelines while avoiding reimplementing available features. 

TDA evaluation in Quanda is comprehensive, beginning with a standard interface for many methods spread across independent repositories. It includes several metrics for various tasks that allow a thorough assessment and comparison. These standard benchmarks are available as precomputed evaluation benchmark suites to ensure user reproducibility and reliability. Quanda differs from its contemporaries, like Captum, TransformerLens, Alibi Explain, etc., in terms of the extensivity and comparability of evaluation metrics. Other evaluation strategies, such as downstream task evaluation and heuristics evaluation, fail due to their fragmented nature, single comparisons, and lack of reliability.

There are several functional units represented by modular interfaces in the Quanda library. It has three main components: explainers, evaluation metrics, and benchmarks. Each element is implemented as a base class that defines the minimal functionalities needed to create a new instance. This base class design permits users to evaluate even novel TDA methods by wrapping their implementation in accordance with the base explainability model.

Quanda is built on Explainers, Metrics, and Benchmarks. Each Explainer represents a specific TDA method, including its architecture, model weights, training dataset, and so on. Metrics summarize the performance and reliability of a TDA method in a compact form.Quanda’s stateful Metric design includes an update method for accounting for new test batches. Additionally, a metric can be categorized into three types: ground_truth, downstream_evaluation, or heuristic. Finally, Benchmark enables standard comparisons across different TDA methods.

An example usage of the Quanda library to evaluate concurrently generated explanations is given below:

Quanda addresses the gaps in TDA evaluation metrics that led to hesitancy in its adoption within the explainable community. TDA researchers can benefit from this library’s standard metrics, ready-to-use setups, and consistent wrappers for available implementations. In the future, it would be interesting to see Quanda’s functionalities extended to more complex areas, such as natural language processing.


Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)

The post Quanda: A New Python Toolkit for Standardized Evaluation and Benchmarking of Training Data Attribution (TDA) in Explainable AI appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Quanda TDA评估 Python工具包 可解释AI
相关文章