热点
"可扩展监督" 相关文章
Research Areas in Evaluation and Guarantees in Reinforcement Learning (The Alignment Project by UK AISI)
少点错误 2025-08-01T19:16:02.000000Z
Research Areas in Learning Theory (The Alignment Project by UK AISI)
少点错误 2025-08-01T10:43:07.000000Z
Research Areas in Computational Complexity Theory (The Alignment Project by UK AISI)
少点错误 2025-08-01T10:43:07.000000Z
Research Areas in Benchmark Design and Evaluation (The Alignment Project by UK AISI)
少点错误 2025-08-01T10:43:06.000000Z
Research Areas in Cognitive Science (The Alignment Project by UK AISI)
少点错误 2025-08-01T10:43:06.000000Z
Rational Animations' video about scalable oversight and sandwiching
少点错误 2025-07-06T14:02:34.000000Z
Prover-Estimator Debate: A New Scalable Oversight Protocol
少点错误 2025-06-17T13:55:20.000000Z
MIT新研究量化AI监督挑战:控制比我们聪明的AI,成功率可能不足52%?
MIT 科技评论 - 本周热榜 2025-05-10T02:06:43.000000Z
UK AISI’s Alignment Team: Research Agenda
少点错误 2025-05-07T16:37:29.000000Z
AGI失控率>90%!MIT教授算出「康普顿常数」,AI地球「夺权率」已锁定?
智源社区 2025-05-06T02:48:02.000000Z
Is weak-to-strong generalization an alignment technique?
少点错误 2025-01-31T07:17:53.000000Z
Balancing Label Quantity and Quality for Scalable Elicitation
少点错误 2024-10-24T17:23:37.000000Z
How should we make trade-offs between the quantity and quality of labels used for eliciting knowledge from capable AI systems?
少点错误 2024-10-24T16:53:07.000000Z
On scalable oversight with weak LLMs judging strong LLMs
少点错误 2024-07-08T09:05:26.000000Z
Oversharing Details of NYU’s Work on Implementing Debate as an Alignment Technique
少点错误 2024-07-06T20:50:08.000000Z
Scalable oversight as a quantitative rather than qualitative problem
少点错误 2024-07-06T17:50:10.000000Z
用AI监督AI,OpenAI做到了用左脚踩右脚上天
36kr 2024-07-02T11:33:49.000000Z
GPT-4批评GPT-4实现「自我提升」,OpenAI前超级对齐团队又一力作被公开
36kr 2024-06-28T10:03:45.000000Z