AI labs can boost external safety research

少点错误 2024年08月01日

AI labs can boost external safety research

前沿 AI 实验室通过多种方式助力外部安全研究，如提供更好的模型访问、发布研究成果等

🌐Anthropic 为部分研究人员提供有帮助的访问权限及免费 API 访问，还为一些研究者提供 API 信用额度，并为 Ryan Greenblatt 提供深度模型访问，但除 Ryan 外无其他精细调整或深度访问，且提供外部指导

💻Google DeepMind 分享部分模型评估用于危险能力研究，发布 Gemma SAE 和权重、Embeddings API，提供外部指导，但对前沿模型无精细调整或深度访问

🤖OpenAI 为一些 OP 受助人提供更好的 API 访问，对 GPT - 3.5 进行精细调整，实验性地为学术研究者提供 GPT - 4 精细调整访问，提供早期访问及 API 相关功能

Published on July 31, 2024 7:30 PM GMT

Frontier AI labs can boost external safety researchers by

^[1]

^[2]

Here's what the labs have done. Let me know if anything is missing/wrong.

Anthropic:

Giving helpful-only access to some researchers developing evals

Giving free API access to some OP grantees

giving some researchers $1K in API credits

Giving deep model access to Ryan Greenblatt

Google DeepMind:

Sharing some of their model evals for dangerous capabilities

OpenAI:

Maybe giving better API access to some OP grantees

Jacob Steinhardt

Rachel Freedman

API gives top 5 logprobs

Meta AI:

Microsoft:

[Nothing]

Somewhat-related papers:

Structured access for third-party research on frontier AI models

Black-Box Access is Insufficient for Rigorous AI Audits

audits

research

A Safe Harbor for AI Evaluation and Red Teaming

^{^}
"Helpful-only" refers to the version of the model RLHFed/RLAIFed/finetuned/whatever for helpfulness but not harmlessness.
^{^}
Releasing model weights will likely be dangerous once models are more powerful, but all past releases seem fine, but e.g. Meta's poor risk assessment and lack of a plan to make release decisions conditional on risk assessment is concerning.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

前沿 AI 实验室外部安全研究模型访问研究成果

相关文章

每日简报：如果你做了研究却不发表，这还算科学吗？

Ask HN: Calico（谷歌生命科学）有结果了吗？

NASA: ↩️ @juliaruiz35 Click the YouTube link at the end of the post! https://youtu.be/4TXDedBlyBI

OpenAI从模型访问中获得的年收入约为10亿美元。（The Information）

KDD 2024 | 美团技术团队精选论文解读 & 论文分享会预告

The AI Scientist: The World’s First AI System for Automating Scientific Research and Open-Ended Discovery

ACL 2024 大奖揭晓！全华人团队用 AI 破解 3000 年前甲骨文密码

AAAI'25 今日截稿！SD 核心成员开源比 Midjourney 还强的文生图模型，现已提供一键启动

突破万字长文输出瓶颈！清华大学开源 LongWriter-6k 数据集；7 个 CCF A 类顶会即将截稿

科学探索奖名单出炉：最年轻仅31岁，数学物理领域女性占半数，每人300万奖金