少点错误 前天 23:38
Novel Idea Generation in LLMs: Judgment as Bottleneck
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了大型语言模型(LLMs)在解决复杂问题(如气候变化)中的潜力与局限。LLMs能够廉价地生成大量解决方案,但其中大部分质量不高。文章的核心在于,从众多低质量方案中筛选出少数优秀想法是LLMs面临的瓶颈。作者通过“振荡创造力机器”实验,验证了LLMs在评估和选择有前景想法方面的不足。文章强调,LLMs缺乏像人类专家那样基于经验的判断力,这限制了它们在解决实际问题中的应用。文章最后提出了对未来研究的展望,即如何通过强化学习等方式提升LLMs的判断能力,从而更好地辅助人类解决难题。

💡LLMs能够以低成本生成大量解决方案,但绝大多数想法质量不高,导致有效方案的筛选成为关键。

🔬作者通过“振荡创造力机器”实验,验证了LLMs在筛选和评估方案方面的不足,未能选出高质量的解决方案。

🧠文章指出,LLMs缺乏人类专家所具备的、基于长期经验积累的判断力,这限制了它们在解决实际问题中的应用。

🧐文章认为,提升LLMs判断力的关键在于,通过强化学习和专家数据等方式,使它们能够更好地评估和选择有潜力的想法。

🌍文章探讨了LLMs在解决气候变化等复杂问题中的应用前景,并强调了克服当前局限的重要性。

Published on April 19, 2025 3:37 PM GMT

In the face of any hard problem—reversing climate change, curing cancer, or starting a great novel—modern LLMs can generate thousands of possible solutions relatively cheaply.

Most solutions from most prompts are bad: they’re not new relative to the state of the art, not feasible, or not significant enough.

But for every thousand ideas an LLM has about how to solve a problem, a few are likely to be good.

Now that LLMs are idea‑generation machines—able to produce ideas so cheaply, even if most are bad—the thing preventing us from waking up with promising climate‑change solutions in our inbox (or whichever problem you care about) comes down to an LLM’s ability to pick those few good ideas out of a thousand crap ones. In other words, I’d guess the rate‑limiting step isn’t generating good ideas but choosing, from among a thousand mostly random ones, the promising few.

At least, that was the bottleneck I perceived in my recent Oscillating Creativity Machine experiment. You give the machine a problem, like solving climate change, and it runs through ten rounds: it generates three possible solutions, picks the most interesting one, then generates three variations of that, and picks one again—ten times in all.

My hypothesis was that if it could pick well at each stage, even if the ideas generated sucked… well, that’s sort of what people do in the shower, in their sleep, or on walks when they’re stuck and then have eureka moments. We oscillate between chaos (possibilities) and order (pruning, picking one path). That’s roughly what I do when I come up with ideas I like.

But the Oscillating Creativity Machine didn’t work well in my tests. It didn’t end up picking a great idea—at least, not by my standards. And it didn’t come up with great ideas, which may have been a limiting factor too.

PhDs build their judgment over decades of real‑life experience: being exposed to experiments at the edge of their knowledge, then getting a sense of which new ideas or data actually help them solve the problems they set out to solve. LLMs, at least as I’ve prompted ChatGPT/Claude/Grok3 so far, don’t seem to have that judgment in the face of unsolved problems.

If LLMs could judge ideas well against unsolved challenges, we could have them generate millions of ideas for every problem that matters, then rate each and surface only the best. You could imagine this automated solution‑discovery process as the key to human flourishing.

But judgment is hard to offload to LLMs so far. Maybe all that’s required is RLHF with a ton of expert data. I’m curious who’s made the most progress here.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM 问题解决 筛选 判断力 人工智能
相关文章