少点错误 02月14日
Bimodal AI Beliefs
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了社会对人工智能(AI)认知不足的现象,指出大众与AI专家之间存在显著的理解差异。普通用户在使用ChatGPT等大型语言模型(LLM)时,往往因无法有效提问而将其归类为“玩具”,而LLM的过度自信也加剧了这一印象。相反,LLM的专家用户通过精湛的提问技巧和对模型优缺点的了解,能够充分利用AI的强大功能,并更宽容地对待其偶尔出现的错误。这种认知上的两极分化可能导致社会对AI发展产生分歧,因此,文章建议将LLM定位为需要技巧的专业工具,而非面向所有人的通用工具,以更准确地引导大众的期望。

🤔 大众用户初次接触LLM时,往往受广告宣传的影响,期望过高,但因提问技巧不足,容易在LLM出错时将其视为“玩具”,从而失去进一步探索的兴趣。

🎨 LLM专家用户则具备更强的提问技巧,能够更好地利用LLM,并在工作中不断提升技能。他们也更宽容地对待LLM的错误,因为他们看到了AI带来的整体益处。

📊 这种认知上的两极分化导致社会对AI能力的看法出现分歧,一部分人对AI持怀疑态度,另一部分人则积极拥抱AI技术。随着AI能力的不断提升,这种分歧可能会加剧。

💡 文章建议将LLM定位为需要技巧的专业工具,而非面向所有人的通用工具。这种定位能更准确地引导大众的期望,并有助于弥合认知鸿沟。

Published on February 14, 2025 6:45 AM GMT

Much is said about society's general lack of AI situational awareness. One prevailing topic of conversation in my social orbit is our ongoing bafflement about how so many other people we know, otherwise smart and inquisitive, seem unaware of or unconcerned about AI progress, x-risk, etc. This hardly seems like a unique experience.

We all can see that there's a lot of motivated reasoning nowadays, now that some industries are starting to understand that sufficiently good AI would introduce massive structural changes or render them obsolete. But the usual suspects also include things like how AI risk (existential and otherwise) flips the usual intuition about the efficiencies gained from new technologies on its head, of how difficult it is in general to imagine the future being a very different kind of world. Of course, the world does change rapidly, and to reason well about it you have to be open to ideas that initially feel weird, but these are all ideas that are not commonly discussed outside communities like this one.

I offer a more innocent explanation for why so many people seem not to grasp both the current capabilities of AI and the trajectory we're on.

The Lay Experience

Consider the experience of the median layperson. It starts when someone (a friend, ad, etc.) makes big claims about what ChatGPT can do and says that you can access those capabilities in plain English. In this way, people sign up, greet it and play around with it a bit. At some point, prompted (heh) by those big claims about AI capabilities, they try to test it by asking it increasingly tricky questions about domains they're familiar with. It does fine at first, but eventually it gets some detail wrong and the illusion of general intelligence is broken. Then the person buckets it into the cognitive category of "toy" and it's over.

Are they wrong? Well, it depends on the questions they asked. If they asked good questions and the LLM got it wrong, they found the frontier of some capability (or a hallucination). If they asked poorly formed questions and the LLM didn't know what to do, then of course it flounder, be nonspecific, or generally seem like a toy. In both cases, whether the "toy" category is correct or not in the user's chosen domain, the overconfidence of LLMs in the face of ambiguity is a genuine UX problem, particularly when reinforced by the aforementioned big claims about AI capabilities measured against the background conditions of the world not (yet) changing much. The intelligence of the product feels like marketing spin in that context.

Now let's focus specifically on AI marketing claims. Here I'm not talking about any specific company, person, or advertisement, but the tone the big AI labs and their users create around their products in the aggregate. It claims to be a tool for everyone, to provide access to specialized knowledge and to carry out complex conversation with users. It claims to be helpful for everyday tasks and to boost productivity. It does not, in any sense, suggest that you need to know how to prompt it effectively to access its strongest capabilities.

Unfortunately, you really do need to know how to do that.

This should be obvious to LLM power users who have seen the difference between the best and worst outputs. But let's keep this abstract for now.

Contrasts

Consider the contrast between the experiences of the above median layperson and of LLM power users who know what models are good & bad at and have a sense of how to craft good prompts. I'm referring to the type of prompt engineering skill that is an art and not a science. Such a user will ask better questions, and thus will get better results in a way that is at least loosely self-reinforcing as their skill grows. This is especially true if they use the outputs in the course of their job, because that probably triples the time during which using an LLM may come to mind.

There is also an effect where a power user—someone using an LLM for work, for example—can forgive the occasional hallucination, because you get better at noticing them and you get so much benefit overall. Humans make mistakes; an LLM does not need to make zero mistakes to act intelligently by human standards. It just needs to equal or improve upon the human error rate. In this way a power user is much less likely to see a hallucination and reflexively dismiss the technology versus someone with a lay perspective, even if neither one knows how LLMs work under the hood.

So in sum, the first group either bounces off the technology or doesn't know how to get the best outputs, and is more inclined to be critical. The second group embraces the technology, learns how to prompt very well, and probably becomes more forgiving of errors. Opinions about the technology will trend downward in the first group and upward in the second, in a way that strengthens over time as capabilities improve and prompting skill remains important.

In this model—independent of anyone's relative intelligence, understanding of how LLMs work, or ideas about AI alignment, gradual disempowerment, x-risk, etc.—beliefs about AI capabilities should naturally trend toward a bimodal distribution. Which group you trend into is thus a function of how much attention you pay to AI research, yes, but also of how much time you spend learning to use them, trying to get real work done.

Implications

When I see a bimodal distribution like this, I become concerned about tribalism. I don't think that's likely here any more than we already see it, because at some point—probably pretty soon—capabilities will become so impressive that lots of people will get disrupted and nobody sensible will deny the situation. The bimodal distribution will eventually collapse into a general understanding of the situation. But before that, it does have implications for how to talk about AI and advocate for controls.

For example, outside of specific types of work where LLMs are most useful like software development, we should expect that people on average will not be that skilled at prompting and thus will not personally experience the strongest capabilities of frontier models. We should expect this to remain true on average even if they try to explore those capabilities, at least until the next iteration of models releases, and probably even then because prompting does not seem like it's declining in importance yet.

Accordingly, in the short term, we should expect an increasing disconnect between the groups as capabilities improve but remain unevenly accessible.[1] As noted above, this will remain true until capabilities become undeniable—or until we get AGI, at which point we have other problems—at which point mainstream society will start really paying attention to the slope of AI progress.

Overall I think this speaks to how we are probably not well-served talking about the current value propositions of LLMs as general-purpose tools for everyone. They are that, in the sense that they can be used productively across many disciplines, but they are also not that, in the sense that the benefits are unevenly distributed toward people whose interests or incentives prime them to spend a lot of time building the skill of prompting. It is more like learning how to paint than learning to ride a bike: fundamentally it is a matter of learning and familiarity that anyone can accomplish, but many people will not choose to do so.

In the meantime, I think LLMs are better imagined and discussed as specialized tools that require finesse to use most effectively. That framing, it seems to me, sets more accurate expectations for people approaching an LLM for the first time.

  1. ^

    Note the DeepSeek r1 phenomenon as a rare time when this disconnect collapsed a bit. Its release in January was the first time many people were exposed to a CoT model, given most people only use free models, and the jump between those and r1 is credibly large even with less effective prompting.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 认知鸿沟 LLM AI专家 大众认知
相关文章