when will LLMs become human-level bloggers?

Published on March 9, 2025 9:10 PM GMT

"Short AI timelines" have recently become mainstream. One now routinely hears the claim that somewhere in the 2026-2028 interval, we'll have AI systems that outperform humans in basically every respect.

For example, the official line from Anthropic holds that "powerful AI" will likely arrive in late 2026 or in 2027. Anthropic's OSTP submission (3/6/2025) says (emphasis in original):^[1]

Based on current research trajectories, we anticipate that powerful AI systems could emerge as soon as late 2026 or 2027 [...]
Powerful AI technology will be built during this Administration [i.e. roughly by EOY 2028 -nost]

where "powerful AI" means, among other things:

In terms of pure intelligence, it is smarter than a Nobel Prize winner across most relevant fields – biology, programming, math, engineering, writing, etc. This means it can prove unsolved mathematical theorems, write extremely good novels, write difficult codebases from scratch, etc.In addition to just being a “smart thing you talk to”, it has all the “interfaces” available to a human working virtually, including text, audio, video, mouse and keyboard control, and internet access. It can engage in any actions, communications, or remote operations enabled by this interface, including taking actions on the internet, taking or giving directions to humans, ordering materials, directing experiments, watching videos, making videos, and so on. It does all of these tasks with, again, a skill exceeding that of the most capable humans in the world.

Anthropic's expectations are relatively aggressive even by short-timelines standards, but it seems safe to say that many well-informed people expect something like "powerful AI" by 2030 at the latest, and quite likely before that^[2].

OK, so let's suppose that by some year 20XX, we will have AIs (probably scaffolded LLMs or similar) which are

smarter than a Nobel Prize winner across most relevant fields

and can

prove unsolved mathematical theorems, write extremely good novels, write difficult codebases from scratch, etc.

This would, obviously, be a system capable of writing things that we deem worth reading.

Amodei explicitly says it would be able to "write extremely good novels." And presumably it would also be able to write extremely good scientific papers, given the mention of the Nobel Prize.

What about blog posts, or blog comments? Surely it would be exceptionally good at those kinds of writing, too, right?

Indeed, "being good at blogging" is a vastly lower bar than the standards Amodei states or implies about the writing abilities of "powerful AI." Consider that:

some

This weighs in favor of LLMs being good at them, because LLMs seem to struggle with "long-horizon" tasks more than humans with a comparable amount of subject-matter knowledge.That is: one might worry that such long-horizon issues would hold LLMs back at novel-writing, even if they were good at writing short-form fiction. But blogging is inherently short-form, so these worries don't apply.

generative models of text content scraped from the web.

Situational Awareness

internet discussion

But note that currently existing LLMs do not cross this quality bar.

None of the blog content we read is primarily LLM-authored, except in special cases where someone is trying to prove a point^[3].

The same is true for blog comments as well.

On LessWrong – which could well be the internet's premier hub for short-timelines views – LLM-written content is typically removed by moderators on grounds such as:

LLM content is generally not good enough for LessWrong, and in particular we don't want it from new users who haven't demonstrated a more general track record of good content.

More generally, it seems like the vast majority of people who engage with LLMs – even people who are bearish on capabilities, even people with short timelines – hold an extremely low opinion of LLM-written content, as such.

In cases where LLMs are considered useful or valuable, the text itself is typically a means to a narrow and user-specified end: we care about a specific judgment the LLM has made, or a specific piece of information it has relayed to us. If we actually read its outputs at all, it is usually for the sake of "extracting" a specific nugget of info that we expect to be there before we've even begun reading the words.

Very few people read this stuff in the expectation that they'll find "straightforwardly good," thought-provoking writing, of the sort that humans produce in large volumes every single day. And that's because, for the most part, LLMs do not produce this type of thing, even when we explicitly request it.

On the face of it, isn't this really, really weird?

We have these amazing systems, these artificial (quasi-?)minds that are proficient in natural language, with seriously impressive math and coding chops and long-tail expert knowledge...

...and their writing is "generally not good enough for LessWrong"?!

We have these spookily impressive AIs that are supposedly going to become world-class intellectuals within a few years – that will supposedly write novels (and "extremely good" novels at that!)^[4], that will be capable of substituting in for large fractions of the workforce and doing Nobel-quality scientific thinking...

...and we don't let them post in online discussion venues, because (we claim) your average mildly-interesting non-expert online "poster" has some crucial capability which they still lack?

We have honest-to-god artificial intelligences that could almost certainly pass the Turing Test if we wanted them to...

...and we're not interested in what they have to say?

Here's a simple question for people who thing something like "powerful AI" is coming very soon:

When do we expect LLMs to become capable of writing online content that we actually think is worth reading?^[5]

(And why are they not already doing so?)

Assuming short timelines, the answer cannot be later than the time we expect "powerful AI" or its equivalent, since "powerful AI" trivially implies this capability.

However, the capability is not here yet, and it's not obvious to me where we specifically expect it to come from.

It's not a data problem: pretraining already includes more than enough blog posts (one would think?), and LLMs already "know" all kinds of things that could be interesting to blog about.

In some sense it is perhaps a "reasoning" problem – maybe LLMs need to think for a long time to come up with insights worthy of blogging about? – but if so, it is not the kind of reasoning problem that will obviously "come for free" from RL on math and coding puzzles.

(Likewise, one could arguably frame this as a problem about insufficient "agency," but it is mysterious to me where the needed "agency" is supposed to come from given that we don't have it already.

Or, to take yet another angle, this could be a limitation of HHH assistant chatbots which might be overcome by training for a different kind of AI "character" – but again, this is something that requires more than just scaling + automated AI researchers, and a case would need to be made that it will happen swiftly and easily in the near term, despite ~no progress on such things since the introduction of the current paradigm in Anthropic's pre-ChatGPT HHH research.)

What milestone(s) will near-future systems need to cross to grant them this capability? When should I expect those milestones to be crossed? And why hasn't this already happened?

P. S. this question feels closely related to Cole Wyeth's "Have LLMs Generated Novel Insights?" But it strikes me as independently interesting, because it sets a very concrete and low bar for the type and depth of "insight" involved.

You don't need to do groundbreaking science to write a blog worth reading; you don't need to be groundbreaking at all; you just need to say something that's in some way novel or interesting, with fairly generous and broad definitions of those terms. And yet...

^{^}
See also Jack Clark's more specific formulation of the same timeframe here: "late 2026, or early 2027"
^{^}
E.g. Miles Brundage (ex-OpenAI) writes:
AI that exceeds human performance in nearly every cognitive domain is almost certain to be built and deployed in the next few years.
and Daniel Kokotajlo (also ex-OpenAI) has held similar views for a long time now.
^{^}
Or where a human is leveraging the deficiencies/weirdness of LLMs for the sake of art and/or comedy, as opposed to trying to produce "straightforwardly good" content that meets the standards we apply to humans. This was the case in my own LLM-authored blog project, which ran from 2019-2023.
^{^}
As you may be able to tell, I am even more skeptical about Amodei's "extremely good" novel-writing thing than I am about most of the other components of the near-term "powerful AI" picture.

LLMs are remarkably bad at fiction writing (long-form especially, but even short-form). This is partially due to HHH chat tuning (base models are better), but not entirely, and anyways I don't see Amodei or anyone else saying "hey, we need to break out of the HHH chat paradigm because it's holding back fiction writing capabilities," so in practice I expect we'll continue to get HHH chatbots with atrocious fiction-writing abilities for the indefinite future.
As far as I can tell there's been very little progress on this front at least since GPT-4 (and possibly earlier), probably because of factors like
Writing novels is much, much more intellectually challenging than blogging (I say, as someone who has done both). I focus on blogging in this post in part because it's such a low bar compared to stuff like this.
^{^}
By "we" I mean something like "me, the guy writing this post, and you, the person reading it, and others with broadly similar preferences about what we read online."
And when I say "content that we think is worth reading," I'm just trying to say "content that would be straightforwardly good if a human wrote it." If LLMs become capable of writing some weird type of adversarial insight-porn that seems good despite not resembling anything a human would write, that doesn't count (though it would be very interesting, and of course bad, if that were to happen).

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签