热点
"语言模型越狱" 相关文章
Anthropic has a new way to protect large language models against jailbreaks
MIT Technology Review » Artificial Intelligence 2025-02-03T16:40:34.000000Z
Avoiding jailbreaks by discouraging their representation in activation space
少点错误 2024-09-28T02:22:44.000000Z