cs.AI updates on arXiv.org 07月25日 12:28
Beyond Internal Data: Constructing Complete Datasets for Fairness Testing
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了在高风险领域和决策中,AI潜在危害和偏差测试的重要性,提出了利用重叠数据集构建合成数据的方法,以克服数据不足的挑战,确保AI公平性。

arXiv:2507.18561v1 Announce Type: cross Abstract: As AI becomes prevalent in high-risk domains and decision-making, it is essential to test for potential harms and biases. This urgency is reflected by the global emergence of AI regulations that emphasise fairness and adequate testing, with some mandating independent bias audits. However, procuring the necessary data for fairness testing remains a significant challenge. Particularly in industry settings, legal and privacy concerns restrict the collection of demographic data required to assess group disparities, and auditors face practical and cultural challenges in gaining access to data. Further, internal historical datasets are often insufficiently representative to identify real-world biases. This work focuses on evaluating classifier fairness when complete datasets including demographics are inaccessible. We propose leveraging separate overlapping datasets to construct complete synthetic data that includes demographic information and accurately reflects the underlying relationships between protected attributes and model features. We validate the fidelity of the synthetic data by comparing it to real data, and empirically demonstrate that fairness metrics derived from testing on such synthetic data are consistent with those obtained from real data. This work, therefore, offers a path to overcome real-world data scarcity for fairness testing, enabling independent, model-agnostic evaluation of fairness, and serving as a viable substitute where real data is limited.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI公平性 数据合成 数据不足 测试方法 AI监管
相关文章