TechCrunch News 2024年12月19日
Exclusive: Google’s Gemini is forcing contractors to rate AI responses outside their expertise
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

谷歌的AI模型Gemini的评估流程近日引发争议。为了改进Gemini,谷歌要求外包公司GlobalLogic的合同工评估AI生成的回复,包括“真实性”。此前,合同工可以跳过超出其专业领域的提示。但最近,谷歌改变了规则,要求合同工评估所有提示,即使他们不具备相关专业知识。这引发了人们对Gemini在敏感领域,如医疗保健等问题上准确性的担忧,因为合同工可能没有足够的背景知识来评估高度技术性的AI回复。这一变化使得评估员只能在提示信息不完整或包含有害内容时才能跳过。

⚙️ 谷歌要求外包公司GlobalLogic的合同工评估Gemini生成的回复,以提高其准确性,评估标准包括回复的“真实性”。

✍️ 此前,合同工可以跳过超出其专业领域的提示,例如,不具备心脏病学背景的合同工可以跳过关于心脏病学的提示。

⚠️ 现在,谷歌要求合同工评估所有提示,即使他们不具备相关专业知识,这引发了人们对Gemini在敏感领域准确性的担忧。

📝 新规下,合同工只能在提示信息不完整或包含有害内容时才能跳过,其他情况都必须进行评估。

Generative AI may look like magic, but behind the development of these systems are armies of employees at companies like Google, OpenAI and others, known as “prompt engineers” and analysts, who rate the accuracy of chatbots’ outputs to improve their AI.

But a new internal guideline passed down from Google to contractors working on Gemini, seen by TechCrunch, has led to concerns that Gemini could be more prone to spouting out inaccurate information on highly sensitive topics, like healthcare, to regular people.

To improve Gemini, contractors working with GlobalLogic, an outsourcing firm owned by Hitachi, are routinely asked to evaluate AI-generated responses according to factors like “truthfulness.”

These contractors were until recently able to “skip” certain prompts, and thus opt out of evaluating various AI-written responses to those prompts, if the prompt was way outside their domain expertise. For example, a contractor could skip a prompt that was asking a niche question about cardiology because the contractor had no scientific background. 

But last week, GlobalLogic announced a change from Google that contractors are no longer allowed to skip such prompts, regardless of their own expertise.

Internal correspondence seen by TechCrunch shows that previously, the guidelines read: “If you do not have critical expertise (e.g. coding, math) to rate this prompt, please skip this task.”

But now the guidelines read: “You should not skip prompts that require specialized domain knowledge.” Instead, contractors are being told to “rate the parts of the prompt you understand” and include a note that they don’t have domain knowledge. 

This has led to direct concerns about Gemini’s accuracy on certain topics, as contractors are sometimes tasked with evaluating highly technical AI responses about issues like rare diseases that they have no background in.

“I thought the point of skipping was to increase accuracy by giving it to someone better?” one contractor noted in internal correspondence, seen by TechCrunch.

Contractors can now only skip prompts in two cases: if they’re “completely missing information” like the full prompt or response, or if they contain harmful content that requires special consent forms to evaluate, the new guidelines show.

Google did not respond to TechCrunch’s requests for comment by press time.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Gemini AI评估 外包 准确性 专业知识
相关文章