cs.AI updates on arXiv.org 07月03日
Text Detoxification: Data Efficiency, Semantic Preservation and Model Generalization
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文针对社交媒体上广泛存在的毒性内容问题,提出了一种两阶段训练框架,优化数据效率、语义保留和模型泛化,有效缓解了先前工作面临的权衡,并在实验中取得了优异性能。

arXiv:2507.01050v1 Announce Type: cross Abstract: The widespread dissemination of toxic content on social media poses a serious threat to both online environments and public discourse, highlighting the urgent need for detoxification methods that effectively remove toxicity while preserving the original semantics. However, existing approaches often struggle to simultaneously achieve strong detoxification performance, semantic preservation, and robustness to out-of-distribution data. Moreover, they typically rely on costly, manually annotated parallel corpora while showing poor data efficiency. To address these challenges, we propose a two-stage training framework that jointly optimizes for data efficiency, semantic preservation, and model generalization. We first perform supervised fine-tuning on a small set of high-quality, filtered parallel data to establish a strong initialization. Then, we leverage unlabeled toxic inputs and a custom-designed reward model to train the LLM using Group Relative Policy Optimization. Experimental results demonstrate that our method effectively mitigates the trade-offs faced by previous work, achieving state-of-the-art performance with improved generalization and significantly reduced dependence on annotated data. Our code is available at: https://anonymous.4open.science/r/Detoxification-of-Text-725F/

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

文本去毒性 数据效率 语义保留 模型泛化
相关文章