热点
"拒绝率降低" 相关文章
Just Enough Shifts: Mitigating Over-Refusal in Aligned Language Models with Targeted Representation Fine-Tuning
cs.AI updates on arXiv.org 2025-07-08T04:33:56.000000Z