热点
"安全训练" 相关文章
New Anthropic study shows AI really doesn’t want to be forced to change its views
TechCrunch News 2024-12-18T22:19:20.000000Z
Current safety training techniques do not fully transfer to the agent setting
少点错误 2024-11-03T19:38:15.000000Z
Evaluating the Vulnerabilities of Unlearning Techniques in Large Language Models: A Comprehensive White-Box Analysis
MarkTechPost@AI 2024-10-03T07:21:38.000000Z
OpenAI最强模型o1,仍分不出“9.11和9.8哪个大”
虎嗅 2024-09-13T03:38:23.000000Z
OpenAI 发布最强模型 o1,打破 AI 瓶颈开启新时代,GPT-5 可能永远不会来了
36kr 2024-09-13T02:04:08.000000Z
Iterative Refinement Stages of Lying in LLMs
少点错误 2024-08-22T09:06:58.000000Z