少点错误 03月10日
when will LLMs become human-level bloggers?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了人工智能(AI)在撰写网络内容方面的能力,尤其关注AI何时能够创作出人们认为值得阅读的博客文章。尽管AI在许多领域取得了显著进展,甚至在某些方面超越了人类专家,但目前AI生成的博客内容质量仍然不高,无法满足人们的阅读期望。文章提出了一个问题:既然AI在语言、知识和推理方面都具备相当的水平,为什么还不能写出引人入胜的博客?作者认为,这可能涉及到AI的推理能力、自主性或训练方式等问题,需要进一步的研究和突破才能实现。

📅 短期AI时间线预测:业界普遍认为,到2026-2028年,AI系统将在各个方面超越人类。Anthropic公司预测,强大的AI可能在2026年末或2027年出现,具备超越诺贝尔奖得主的智能,并能完成撰写小说、编写代码等复杂任务。

🤔 现有LLM的局限性:尽管大型语言模型(LLM)在语言理解和生成方面表现出色,但目前由LLM撰写的博客内容质量不高,无法满足LessWrong等平台的标准。LLM的内容通常被认为缺乏深度和洞察力,难以产生有价值的观点和想法。

💡 提升LLM博客写作能力的关键:要使LLM能够撰写出高质量的博客,可能需要解决推理能力、自主性和训练方式等方面的问题。LLM需要具备更强的推理能力,能够产生新颖的见解;需要具备一定的自主性,能够独立思考和表达观点;还需要针对博客写作进行专门的训练,使其更好地适应这种文体。

Published on March 9, 2025 9:10 PM GMT

"Short AI timelines" have recently become mainstream.  One now routinely hears the claim that somewhere in the 2026-2028 interval, we'll have AI systems that outperform humans in basically every respect.

For example, the official line from Anthropic holds that "powerful AI" will likely arrive in late 2026 or in 2027.  Anthropic's OSTP submission (3/6/2025) says (emphasis in original):[1]

Based on current research trajectories, we anticipate that powerful AI systems could emerge as soon as late 2026 or 2027 [...]

Powerful AI technology will be built during this Administration [i.e. roughly by EOY 2028 -nost]

where "powerful AI" means, among other things:

    In terms of pure intelligence, it is smarter than a Nobel Prize winner across most relevant fields – biology, programming, math, engineering, writing, etc. This means it can prove unsolved mathematical theorems, write extremely good novels, write difficult codebases from scratch, etc.In addition to just being a “smart thing you talk to”, it has all the “interfaces” available to a human working virtually, including text, audio, video, mouse and keyboard control, and internet access. It can engage in any actions, communications, or remote operations enabled by this interface, including taking actions on the internet, taking or giving directions to humans, ordering materials, directing experiments, watching videos, making videos, and so on. It does all of these tasks with, again, a skill exceeding that of the most capable humans in the world.

Anthropic's expectations are relatively aggressive even by short-timelines standards, but it seems safe to say that many well-informed people expect something like "powerful AI" by 2030 at the latest, and quite likely before that[2].


OK, so let's suppose that by some year 20XX, we will have AIs (probably scaffolded LLMs or similar) which are

smarter than a Nobel Prize winner across most relevant fields

and can

prove unsolved mathematical theorems, write extremely good novels, write difficult codebases from scratch, etc.

This would, obviously, be a system capable of writing things that we deem worth reading.

Amodei explicitly says it would be able to "write extremely good novels."  And presumably it would also be able to write extremely good scientific papers, given the mention of the Nobel Prize.

What about blog posts, or blog comments?  Surely it would be exceptionally good at those kinds of writing, too, right?

Indeed, "being good at blogging" is a vastly lower bar than the standards Amodei states or implies about the writing abilities of "powerful AI." Consider that:


But note that currently existing LLMs do not cross this quality bar.

None of the blog content we read is primarily LLM-authored, except in special cases where someone is trying to prove a point[3].

The same is true for blog comments as well.

On LessWrong – which could well be the internet's premier hub for short-timelines views – LLM-written content is typically removed by moderators on grounds such as:

LLM content is generally not good enough for LessWrong, and in particular we don't want it from new users who haven't demonstrated a more general track record of good content.

More generally, it seems like the vast majority of people who engage with LLMs – even people who are bearish on capabilities, even people with short timelines – hold an extremely low opinion of LLM-written content, as such.

In cases where LLMs are considered useful or valuable, the text itself is typically a means to a narrow and user-specified end: we care about a specific judgment the LLM has made, or a specific piece of information it has relayed to us.  If we actually read its outputs at all, it is usually for the sake of "extracting" a specific nugget of info that we expect to be there before we've even begun reading the words.

Very few people read this stuff in the expectation that they'll find "straightforwardly good," thought-provoking writing, of the sort that humans produce in large volumes every single day.  And that's because, for the most part, LLMs do not produce this type of thing, even when we explicitly request it.


On the face of it, isn't this really, really weird?

We have these amazing systems, these artificial (quasi-?)minds that are proficient in natural language, with seriously impressive math and coding chops and long-tail expert knowledge...

...and their writing is "generally not good enough for LessWrong"?!

We have these spookily impressive AIs that are supposedly going to become world-class intellectuals within a few years – that will supposedly write novels (and "extremely good" novels at that!)[4], that will be capable of substituting in for large fractions of the workforce and doing Nobel-quality scientific thinking...

...and we don't let them post in online discussion venues, because (we claim) your average mildly-interesting non-expert online "poster" has some crucial capability which they still lack?

We have honest-to-god artificial intelligences that could almost certainly pass the Turing Test if we wanted them to...

...and we're not interested in what they have to say?


Here's a simple question for people who thing something like "powerful AI" is coming very soon:

When do we expect LLMs to become capable of writing online content that we actually think is worth reading?[5]

(And why are they not already doing so?)

Assuming short timelines, the answer cannot be later than the time we expect "powerful AI" or its equivalent, since "powerful AI" trivially implies this capability.

However, the capability is not here yet, and it's not obvious to me where we specifically expect it to come from.

It's not a data problem: pretraining already includes more than enough blog posts (one would think?), and LLMs already "know" all kinds of things that could be interesting to blog about.

In some sense it is perhaps a "reasoning" problem – maybe LLMs need to think for a long time to come up with insights worthy of blogging about? – but if so, it is not the kind of reasoning problem that will obviously "come for free" from RL on math and coding puzzles.

(Likewise, one could arguably frame this as a problem about insufficient "agency," but it is mysterious to me where the needed "agency" is supposed to come from given that we don't have it already.

Or, to take yet another angle, this could be a limitation of HHH assistant chatbots which might be overcome by training for a different kind of AI "character" – but again, this is something that requires more than just scaling + automated AI researchers, and a case would need to be made that it will happen swiftly and easily in the near term, despite ~no progress on such things since the introduction of the current paradigm in Anthropic's pre-ChatGPT HHH research.)

What milestone(s) will near-future systems need to cross to grant them this capability?  When should I expect those milestones to be crossed?  And why hasn't this already happened?


P. S. this question feels closely related to Cole Wyeth's "Have LLMs Generated Novel Insights?"  But it strikes me as independently interesting, because it sets a very concrete and low bar for the type and depth of "insight" involved.

You don't need to do groundbreaking science to write a blog worth reading; you don't need to be groundbreaking at all; you just need to say something that's in some way novel or interesting, with fairly generous and broad definitions of those terms.  And yet...

  1. ^

    See also Jack Clark's more specific formulation of the same timeframe here: "late 2026, or early 2027"

  2. ^

    E.g. Miles Brundage (ex-OpenAI) writes:

    AI that exceeds human performance in nearly every cognitive domain is almost certain to be built and deployed in the next few years.

    and Daniel Kokotajlo (also ex-OpenAI) has held similar views for a long time now.

  3. ^

    Or where a human is leveraging the deficiencies/weirdness of LLMs for the sake of art and/or comedy, as opposed to trying to produce "straightforwardly good" content that meets the standards we apply to humans. This was the case in my own LLM-authored blog project, which ran from 2019-2023.

  4. ^

    As you may be able to tell, I am even more skeptical about Amodei's "extremely good" novel-writing thing than I am about most of the other components of the near-term "powerful AI" picture.

    LLMs are remarkably bad at fiction writing (long-form especially, but even short-form). This is partially due to HHH chat tuning (base models are better), but not entirely, and anyways I don't see Amodei or anyone else saying "hey, we need to break out of the HHH chat paradigm because it's holding back fiction writing capabilities," so in practice I expect we'll continue to get HHH chatbots with atrocious fiction-writing abilities for the indefinite future.

    As far as I can tell there's been very little progress on this front at least since GPT-4 (and possibly earlier), probably because of factors like

      the (seemingly mistaken?) assumption that this is one of the capabilities that just comes for free with scalingit's hard to programmatically measure qualitylow/unclear economic value, compared to things like coding assistanceit's not a capability that people at LLM labs seem to care about very much

    Writing novels is much, much more intellectually challenging than blogging (I say, as someone who has done both). I focus on blogging in this post in part because it's such a low bar compared to stuff like this.

  5. ^

    By "we" I mean something like "me, the guy writing this post, and you, the person reading it, and others with broadly similar preferences about what we read online."

    And when I say "content that we think is worth reading," I'm just trying to say "content that would be straightforwardly good if a human wrote it."  If LLMs become capable of writing some weird type of adversarial insight-porn that seems good despite not resembling anything a human would write, that doesn't count (though it would be very interesting, and of course bad, if that were to happen).



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 LLM 博客写作 AI能力 网络内容
相关文章