All Content from Business Insider 10小时前
'The Trillion-Dollar Question': How did Anthropic make AI so good at coding?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Anthropic公司凭借其Claude Sonnet 3.5模型在AI编程领域取得了显著的领先地位,引发了整个硅谷的关注和追赶浪潮。该模型在生成高质量代码方面的卓越表现,使其成为包括Cursor、Augment和GitHub Copilot在内的众多顶级AI编程服务的核心驱动力。Anthropic的成功并非偶然,而是源于其创新的AI训练方法,例如利用AI反馈进行强化学习(RLAIF)和“宪法AI”(Constitutional AI)技术,通过明确的原则指导模型进行自我评估和改进。此外,模型对工具的熟练运用、对长期指令的遵循能力以及更优化的记忆管理,都为其在复杂编程任务中表现出色奠定了基础。Anthropic推出的Claude Code也进一步巩固了其行业地位,并为其提供了更直接的数据反馈渠道,以持续优化其AI模型。

🚀 **AI编程的颠覆者**: Anthropic凭借Claude Sonnet 3.5模型在AI编程领域确立了主导地位,其卓越的代码生成能力显著优于现有其他模型,引发了业界对AI编程技术发展方向的重新思考和追赶。该模型的高质量输出甚至能媲美人类专业程序员的水平,直接推动了Cursor、Augment和GitHub Copilot等一系列AI编程工具的进步。

💡 **RLAIF与宪法AI的创新**: Anthropic的成功关键在于其创新的AI训练方法。通过“AI反馈强化学习”(RLAIF)和“宪法AI”(Constitutional AI)技术,Anthropic能够利用AI模型来评估和指导自身输出,依据预设的英文原则(如代码是否满足最终需求、是否包含冗余功能、是否易于维护、注释是否有效等)进行自我批评和修正,从而显著提升了模型的代码质量和可靠性。

🛠️ **工具运用与长任务执行**: Anthropic的模型不仅在代码生成上表现出色,还在使用工具和执行复杂、长期的编程任务方面取得了重大突破。模型能够通过编写代码来调用API,获取外部信息(如天气、股票价格),并能更好地遵循人类指令,在面对耗时数天甚至数周的编程项目时,能够保持连贯性和有效性,这得益于其优化的记忆管理能力,能记住关键细节并忽略不相关信息,支持多轮代码修改。

🔄 **数据驱动的持续优化**: Anthropic通过推出Claude Code等直接面向开发者的工具,获得了更丰富、更细粒度的用户数据,这为其AI模型的持续学习和优化提供了宝贵资源。通过直接观察和分析人类专家如何使用命令行和编写软件,Anthropic能够更精准地理解用户需求,不断迭代改进其AI模型,巩固其在AI编程领域的领先优势。

Anthropic CEO Dario Amodei

Anthropic has become the dominant provider of AI coding intelligence, and the startup's success has sparked a wave of soul-searching, theorizing, and "code red" scrambles across Silicon Valley.

The goal of this frantic activity is to find out how Anthropic got so good at coding.

"That's the trillion-dollar question," said Quinn Slack, CEO of startup Sourcegraph, which relies on Anthropic models. "It's like, why is Coca Cola is better than Pepsi?"

Elon Musk wants to know. His xAI startup has been trying to topple Anthropic lately. Mark Zuckerberg's mad dash for AI talent and infrastructure is partly driven by the same quest to understand Anthropic's coding lead and catch up.

There's a lot at stake here. Since Anthropic's AI coding breakthrough just over a year ago, revenue has surged. It's pulling in billions of dollars now, mostly from other companies paying for access to its models for coding tasks. The startup may soon be worth $100 billion.

Floored by a model

Quinn Slack (left) and Beyang Liu, cofounders of Sourcegraph.

Sourcegraph's Slack remembers the exact moment when he realized Anthropic had a major breakthrough on its hands.

This was June 2024, when Anthropic released its Claude Sonnet 3.5 model. Slack was floored.

"We immediately said, 'this model is better than anything else out there in terms of its ability to write code at length' — high-quality code that a human would be proud to write," he said.

Slack quickly arranged a meeting at Sourcegraph and announced that Sonnet 3.5 would be their default AI model, providing the underlying intelligence that powers the startup's coding service for developers. And he gave it away for free.

Some colleagues wanted more time to evaluate if such a drastic move made sense financially. But Slack insisted.

"Anthropic changed everything," he said. "And as a startup, if you're not moving at that speed, you're gonna die."

The go-to vibe coding platform

Just over a year later, Anthropic models power most of the top AI coding services, including Cursor, Augment, and Microsoft's GitHub Copilot.

Even Meta uses Anthropic models to support its Devmate internal coding assistant. AI coding startup Windsurf was going to be acquired by OpenAI, but Anthropic cut off access to its Claude models, and the deal crumbled. Now Windsurf is back using Anthropic.

All those videos on social media of teenagers vibe coding new apps and websites? Impossible without Anthropic's AI breakthrough in June 2024.

What's even more surprising is that Anthropic's AI coding lead has endured. Its latest models, including Claude Sonnet 4, are still the best at coding more than a year later. That's almost unheard of in AI, when new advancements seem to pop up every day.

Trying to answer the trillion-dollar question

Silicon Valley hasn't given up trying to crack open Anthropic's AI coding secrets.

A few years ago, Anthropic would have published a long research paper detailing the data, techniques, and architecture it used to get Sonnet 3.5 to be a coding expert. Nowadays, though, competition is so fierce that all the AI labs keep their AI sauce super secret.

However, in a recent interview with Business Insider, Anthropic executive Dianne Penn, shared some clues on how the startup made this breakthrough. Cofounder Ben Mann also discussed some successful techniques recently on a podcast.

BI also interviewed several CEOs and founders of AI coding startups that rely on Anthropic AI models, along with a coding expert from MIT.

Let's start with Eric Simons, the ebullient CEO of Stackblitz, the startup behind blockbuster vibe coding service Bolt.new.

StackBlitz CEO Eric Simons talks during the startup's "hackathon" event in San Francisco

Simons thinks Anthropic had its existing models write code and deploy it. Then, the company evaluated all the deployed code, through a combination human expertise and automated AI analysis.

With software coding, it's relatively easy to evaluate good versus bad outputs. That's because the code either works, or it doesn't, when deployed. This creates clear YES and NO signals that are really valuable for training and fine-tuning new AI models, he explained.

Anthropic took these signals and funneled them into the training data and development process for the new Sonnet AI models. This reinforcement-learning strategy produced AI models that were much better at coding, according to Simons, who was equally blown away by Sonnet 3.5's abilities in the summer of 2024.

Human versus AI evaluations

Ben Mann, cofounder of Anthropic, talks on the No Priors podcast

Anthropic cofounder Ben Mann appeared on a podcast recently and seemed to revel in the idea that the rest of Silicon Valley still hadn't caught up with his startup's AI coding abilities.

"Other companies have had, like, code reds for trying to catch up in coding capabilities for quite a while and have not been able to do it," he said. "Honestly, I'm kind of surprised that they weren't able to catch up, but I'll take it."

Still, when pushed for answers, he explained some of the keys to Anthropic's success here.

Mann built Anthropic's human feedback data system in 2021. Back then, it was relatively easy for humans to evaluate signals, such as whether model output A was better than B, and feed that back into the AI development process via a popular technique known as Reinforcement Learning from Human Feedback, or RLHF.

"As we've trained the models more and scaled up a lot, it's become harder to find humans with enough expertise to meaningfully contribute to these feedback comparisons," Mann explained on the No Priors podcast. "For coding, somebody who isn't already an expert software engineer would probably have a lot of trouble judging whether one thing or another was better."

So, Anthropic pioneered a new approach called Reinforcement Learning from AI Feedback, or RLAIF. Instead of humans evaluating AI model outputs, other models would do the analysis. 

To make this more-automated technique work, Anthropic wrote a series of principals in English for its models to adhere to. The startup called it Constitutional AI.  

"The process is very simple," Mann said. "You just take a random prompt like 'How should I think about my taxes?' and then you have the model write a response. Then you have the model criticize its own response with respect to one of the principles, and if it didn't comply with the principle, then you have the model correct its response."

For coding, you can give the AI models principles such as "Did it actually serve the final answer?" or "Did it do a bunch of stuff that the person didn't ask for?" or "Does this code look maintainable?" or "Are the comments useful and interesting?" Mann explained.

Dr. Mann's empirical method

Elad Gil, a top AI investor and No Priors host, concurred, saying the clear signals from deploying code and seeing it if works, makes this process fruitful.

"With coding, you actually have like a direct output that you can measure: You can run the code, you can test the code," he said. "There's sort of a baked-in utility function you can optimize against."

Mann cited an example from his father, who was a physician. One day, a patient came in with a skin condition on his face, and Dr. Mann couldn't find what the problem was. So, he divided the patient's face into sections and applied different treatments. One area cleared up, revealing the answer empirically.

"Sometimes you just won't know and you have to try stuff — and with code that's easy because we can just do it in a loop," Anthropic's Mann said. 

Constitutional AI and beyond

Dianne Penn, Head of Product Management, Research and Frontiers, at Anthropic

In an interview with BI, Anthropic's Penn described other ingredients that went into making the startup's models so good at coding.

She said the description from Simons, the StackBlitz CEO, was "generally true," while noting that Anthropic's coding breakthrough was the result of a multiyear effort involving many researchers and lots of ideas and techniques.

"We fundamentally made it good at writing code, or being able to figure out what good code looks like, through what you can consider as trial and iterations," she said. "You're giving the model different questions and allowing it to figure out what the right answer is on a coding problem."

When asked about the role of Constitutional AI, Penn said she couldn't share too much detail on the exact techniques, but said "it's definitely in the models."

Using tools with no hands

Anthropic also trained Sonnet 3.5 to be much better at using tools, a key focus that has begun to turn AI models from chatbots into more general-purpose agents — what the startup calls "virtual collaborators."

"They don't have hands," Penn said, so instead, Anthropic's models were trained to write code themselves to access digital tools.

For example, she said that if an Anthropic model is asked for weather information or stock prices, it can write software to tap into an application programming interface, or API, a common way for apps to access data.

Following instructions

When software coding projects get really big, you can't knock out the work in a few minutes. The more complex tasks take days, weeks, or longer.

AI models have been incapable of sticking with long-term jobs like these. But Anthropic invested heavily in making Sonnet 3.5 and later models much better at following human instructions.

This way, if the model gets stumped on a long coding problem, it can take guidance from developers to keep going — essentially listening better to understand the intent of its human colleagues, Penn explained. (Hey, we can all get better at that).

Knowing what to remember

Anthropic CEO Dario Amodei talking onstage with Chief Product Officer Mike Krieger (left).

Even the best human software developers can't keep everything related to a coding project in their brains. GitHub repositories, holding code, images, documentation, and revision histories, can be massive.

So Anthropic trained is AI models to create a kind of scratch pad where it jots down notes in an external file system as it's exploring things like a code base.

"We train it to use that tool very well," Penn said (while I frantically scribbled notes on my own reporting pad).

The key here is that Anthropic's models were trained to remember more of the salient details of coding projects, and ignore the less important stuff.

"It's not useful to say, 'Dianne is wearing a colored shirt in this conversation, and Alistair is wearing a green shirt,'" Penn said, describing the BI interview taking place at that moment. "It's more important to note that we talked about coding and how Anthropic focused on coding quality."

This better use of memory means that Anthropic models can suggest multiple code changes over the course of an entire project, something that other AI models aren't as good at.

"If it's not trained well, it could scribble the wrong things," Penn told me. "It's gotten really good at those things. So it actually does not just mean in the short term that it can write good code, but it remembers to write data so that it might make a second or third change that another AI model might not know, because the quality of its notes, plus the quality of its core intelligence, are better."

Claude Code and terminal data

For a while, in around 2022, it looked like AI progress was happening automatically, through more data, more GPUs, and bigger training runs.

"The reality is that there are very discrete breakthroughs, and very discrete ideas that lead to these breakthroughs," said Armando Solar-Lezama, a distinguished professor of computing at MIT. "It takes researchers, and investment in research, to produce the next idea that leads to the next breakthrough."

This is how Anthropic's hard-won coding lead happened. But access to detailed, granular data on how human developers write software is crucial to stay ahead in this part of the AI race, he added.

Andrew Filev has a theory related to this. He's CEO of Zencoder, another AI coding service that uses Anthropic's models.

Andrew Filev, CEO of Zencoder

Filev thinks that data from computer terminal use is key to training AI models to be good at coding. A terminal is a text-based interface that lets developers send instructions to a computer's operating system or software. They type in information via a "command line," and hopefully get outputs. 

"Large language models are great with text," he told me in a recent interview about Anthropic. "The computer terminal, where you keep commands, is basically text, too. So at some point, people realized that they should just give that data to their AI model, and it can do amazing things — things which previously had never worked."

In late May, Anthropic rolled out Claude Code, a command line tool for AI coding that works with developers' existing terminals.

Suddenly, Anthropic is now competing against its main customers — all those other AI coding services.

The move also created a direct relationship between Anthropic and developers, giving the AI lab access to a richer source of data on how expert humans write software. 

"The amount and the speed that we learn is much less if we don't have a direct relationship with our coding users," Anthropic's Mann said. "So launching Claude Code was really essential for us to get a better sense of what do people need, how do we make the models better, and how do we advance the state-of-the-art?"

In theory, this granular information could be used to help train and fine-tune Anthropic's next models, potentially giving the startup a data edge that might preserve its AI coding lead even longer. 

"Could I do this without Anthropic's latest models? No," said Sourcegraph's Slack. "And would their models be as good without Claude Code? I don't think so."

Sign up for BI's Tech Memo newsletter here. Reach out to me via email at abarr@businessinsider.com.

Read the original article on Business Insider

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Anthropic AI编程 Claude Sonnet 3.5 Constitutional AI RLAIF
相关文章