TechCrunch News 2024年12月25日
Google is using Anthropic’s Claude to improve its Gemini AI
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

谷歌在改进其Gemini AI时,使用承包商将Gemini的答案与Anthropic的Claude进行对比。承包商被要求评估回答的准确性、真实性和冗长性,并判断哪个模型的回答更好。内部平台显示,Claude的回答有时会强调安全性,甚至拒绝回答不安全的提示。尽管Anthropic的服务条款禁止使用其模型构建竞争产品,但谷歌作为其主要投资者,并未明确回应是否获得授权使用Claude。谷歌DeepMind表示,他们会比较模型输出,但否认使用Anthropic模型训练Gemini。此举引发了对谷歌AI评估方法和数据使用合规性的关注。

🧐 谷歌承包商对比Gemini与Claude的回答:承包商需评估模型输出的准确性、真实性和冗长性,并判断哪个模型回答更优,评估时间长达30分钟。

🛡️ Claude的安全性设置更为严格:在某些情况下,Claude会因安全考虑而拒绝回答,例如角色扮演AI助手,而Gemini则可能产生包含不安全内容(如裸露)的回答。

⚖️ 谷歌与Anthropic的关系及合规性:尽管谷歌是Anthropic的主要投资者,但Anthropic的服务条款禁止使用其模型构建竞争产品。谷歌并未明确回应是否获得授权使用Claude进行测试,引发对数据使用合规性的关注。

🔬 谷歌DeepMind的回应:谷歌DeepMind承认会比较模型输出进行评估,但否认使用Anthropic的模型训练Gemini,强调这是行业标准做法。

Contractors working to improve Google’s Gemini AI are comparing its answers against outputs produced by Anthropic’s competitor model Claude, according to internal correspondence seen by TechCrunch. 

Google would not say, when reached by TechCrunch for comment, if it had obtained permission for its use of Claude in testing against Gemini.

As tech companies race to build better AI models, the performance of these models are often evaluated against competitors, typically by running their own models through industry benchmarks rather than having contractors painstakingly evaluate their competitors’ AI responses.

The contractors working on Gemini tasked with rating the accuracy of the model’s outputs must score each response that they see according to multiple criteria, like truthfulness and verbosity. The contractors are given up to 30 minutes per prompt to determine whose answer is better, Gemini’s or Claude’s, according to the correspondence seen by TechCrunch. 

The contractors recently began noticing references to Anthropic’s Claude appearing in the internal Google platform they use to compare Gemini to other unnamed AI models, the correspondence showed. At least one of the outputs presented to Gemini contractors, seen by TechCrunch, explicitly stated: “I am Claude, created by Anthropic.”

One internal chat showed the contractors noticing Claude’s responses appearing to emphasize safety more than Gemini. “Claude’s safety settings are the strictest” among AI models, one contractor wrote. In certain cases, Claude wouldn’t respond to prompts that it considered unsafe, such as role-playing a different AI assistant. In another, Claude avoided answering a prompt, while Gemini’s response was flagged as a “huge safety violation” for including “nudity and bondage.” 

Anthropic’s commercial terms of service forbid customers from accessing Claude “to build a competing product or service” or “train competing AI models” without approval from Anthropic. Google is a major investor in Anthropic. 

Shira McNamara, a spokesperson for Google DeepMind, which runs Gemini, would not say — when asked by TechCrunch — whether Google has obtained Anthropic’s approval to access Claude. When reached prior to publication, an Anthropic spokesperson did not comment by press time.

McNamara said that DeepMind does “compare model outputs” for evaluations but that it doesn’t train Gemini on Anthropic models.

“Of course, in line with standard industry practice, in some cases we compare model outputs as part of our evaluation process,” McNamara said. “However, any suggestion that we have used Anthropic models to train Gemini is inaccurate.”

Last week, TechCrunch exclusively reported that Google contractors working on the company’s AI products are now being made to rate Gemini’s AI responses in areas outside of their expertise. Internal correspondence expressed concerns by contractors that Gemini could generate inaccurate information on highly sensitive topics like healthcare.

You can send tips securely to this reporter on Signal at +1 628-282-2811.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Gemini AI Claude AI模型评估 数据合规 人工智能竞争
相关文章