TechCrunch News 02月06日
Why IQ is a poor test for AI
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

OpenAI CEO Sam Altman声称AI的“智商”在过去几年里迅速提高。然而,专家们认为,使用智商(IQ)作为衡量AI能力的基准具有误导性。智商测试是相对的,而非客观的,主要测试逻辑和抽象推理,忽略了实践智能。AI在智商测试中具有不公平的优势,因为它们拥有海量的记忆和内置知识,且经过大量重复训练。专家呼吁开发更好的AI测试方法,避免将AI的能力与人类能力直接比较,因为AI解决问题的方式与人类截然不同。

🤔 Altman将AI智商与人类智商相提并论,认为AI每年都在智商上进步一个标准差,但专家指出,智商测试是衡量人类智力的工具,不能直接用于评估AI的能力。将两者进行比较,好比将苹果与橘子进行比较。

🧠 智商测试主要评估逻辑和抽象推理能力,但忽略了实践智能。此外,智商测试容易受到西方文化规范的影响,存在偏差。AI在智商测试中表现出色,更多地反映了测试本身的缺陷,而非AI的真正能力。

💾 AI拥有海量的记忆和内置知识,并通过大量重复训练来应对智商测试,这使得它们在测试中具有不公平的优势。人类在解决问题时会受到各种因素的干扰,而AI则可以更专注地处理信息。

🧪 专家呼吁开发更好的AI测试方法,避免简单地将AI的能力与人类能力进行比较。AI解决问题的方式与人类不同,因此需要采用更合适的评估标准。

During a recent press appearance, OpenAI CEO Sam Altman said that he’s observed the “IQ” of AI rapidly improve over the past several years.

“Very roughly, it feels to me like — this is not scientifically accurate, this is just a vibe or spiritual answer — every year we move one standard deviation of IQ,” Altman said.

Altman isn’t the first to use IQ, an estimation of a person’s intelligence, as a benchmark for AI progress. AI influencers on social media have given models IQ tests and ranked the results.

But many experts say that IQ is a poor measure of a model’s capabilities — and a misleading one.

“It can be very tempting to use the same measures we use for humans to describe capabilities or progress, but this is like comparing apples with oranges,” Sandra Wachter, a researcher studying tech and regulation at Oxford, told TechCrunch.

In his comments at the presser, Altman equated IQ with intelligence. Yet IQ tests are relative — not objective — measures of certain kinds of intelligence. There’s some consensus that IQ is a reasonable test of logic and abstract reasoning. But it doesn’t measure practical intelligence — knowing how to make things work — and it’s at best a snapshot.

“IQ is a tool to measure human capabilities — a contested one no less — based on what scientists believe human intelligence looks like,” Wachter noted. “But you can’t use the same measure to describe AI capabilities. A car is faster than humans, and a submarine is better at diving. But this doesn’t mean cars or submarines surpass human intelligence. You’re equivocating one aspect of performance with human intelligence, which is much more complex.”

To excel at an IQ test, the origins of which some historians trace back to eugenics, the widely discredited scientific theory that people can be improved through selective breeding, a test taker must have a strong working memory and knowledge of Western cultural norms. This invites the opportunity for bias, of course, which is why one psychologist has called IQ tests “ideologically corruptible mechanical models” of intelligence.

That a model might do well on an IQ test indicates more about the test’s flaws than the model’s performance, according to Os Keyes, a doctorate candidate at the University of Washington studying ethical AI.

“[These] tests are pretty easy to game if you have a practically infinite amount of memory and patience,” Keyes said. “IQ tests are a highly limited way of measuring cognition, sentience, and intelligence, something we’ve known since before the invention of the digital computer itself.”

AI likely has an unfair advantage on IQ tests, as well, considering that models have massive amounts of memory and internalized knowledge at their disposal. Often, models are trained on public web data, and the web is full of example questions taken from IQ tests.

“Tests tend to repeat very similar patterns — a pretty foolproof way to raise your IQ is to practice taking IQ tests, which is essentially what every [model] has done,” said Mike Cook, a research fellow at King’s College London specializing in AI. “When I learn something, I don’t get it piped into my brain with perfect clarity 1 million times, unlike AI, and I can’t process it with no noise or signal loss, either.”

Ultimately, IQ tests — biased as they are — were designed for humans, Cook added — intended as a way to evaluate general problem-solving abilities. They’re inappropriate for a technology that approaches solving problems in a very different way than people do.

“A crow might be able to use a tool to recover a treat from a box, but that doesn’t mean it can enroll at Harvard,” Cook said. “When I solve a mathematics problem, my brain is also contending with its ability to read the words on the page correctly, to not think about the shopping I need to do on the way home, or if it’s too cold in the room right now. In other words, human brains contend with a lot more things when they solve a problem — any problem at all, IQ tests or otherwise — and they do it with a lot less help [than AI.]”

All this points to the need for better AI tests, Heidy Khlaaf, chief AI scientist at the AI Now Institute, told TechCrunch.

“In the history of computation, we haven’t compared computing abilities to that of humans’ precisely because the nature of computation means systems have always been able to complete tasks already beyond human ability,” Khlaaf said. “This idea that we directly compare systems’ performance against human abilities is a recent phenomenon that is highly contested, and what surrounds the controversy of the ever-expanding — and moving — benchmarks being created to evaluate AI systems.”

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI智商 智商测试 人工智能评估 AI能力
相关文章