Mashable 2024年12月20日
Supposed expert reviews of Google Gemini outputs are coming from non-experts
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了谷歌Gemini的测试准确性问题。指出其回应有时不准确,测试者可能缺乏专业知识进行核查。谷歌虽称与可信测试者合作进行评估,但对非危险但不准确的回应关注较少,且通过免责声明试图免责。GlobalLogic曾要求员工跳过不理解的Gemini回应,后又改变指令。

🥽谷歌Gemini回应有时不准确,测试者缺乏专业核查能力

📋谷歌称与可信测试者合作评估,但对部分不准确回应关注少

💡GlobalLogic曾要求员工跳过不理解的回应,后改变指令

🚫谷歌通过免责声明试图免除对不准确回应的责任

Like any genAI model, Google Gemini responses can sometimes be inaccurate, but in this case it might be because testers don't have the expertise to fact-check them.

According to TechCrunch, the firm hired to improve accuracy for Gemini is now making its testers evaluate responses even if they don't have the "domain knowledge."

The report raises questions about the rigor and standards Google says it applies to testing Gemini for accuracy. In the "Building responsibly" section of the Gemini 2.0 announcement, Google said it is "working with trusted testers and external experts and performing extensive risk assessments and safety and assurance evaluations." There's a reasonable focus on evaluating responses for sensitive and harmful content, but less attention is paid to responses that aren't necessarily dangerous but just inaccurate.

Google seems to disregard the hallucination and error problem by simply adding a disclaimer that "Gemini can make mistakes, so double-check it," which effectively absolves it from any responsibility. But that doesn't account for the humans doing the work behind the scenes.

Previously GlobalLogic, a subsidiary of Hitachi, instructed its prompt engineers and analysts to skip a Gemini response they didn't fully understand. "If you do not have critical expertise (e.g. coding, math) to rate this prompt, please skip this task," said the guidelines viewed by the outlet.

But last week, GlobalLogic changed its instructions, saying, "You should not skip prompts that require specialized domain knowledge," and to instead "rate the parts of the prompt you understand," and note that they don't have the required expertise in their analysis. Expertise, in other words, is not being treated as a prerequisite for this work.

Contractors can now only skip prompts that are "completely missing information," according to TechCrunch, or those that contain sensitive content that requires a consent form.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

谷歌Gemini 测试准确性 专业知识 免责声明
相关文章