Newsroom Anthropic 04月23日 04:05
Anthropic Economic Index: Insights from Claude 3.7 Sonnet
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文发布了Anthropic经济指数的第二份研究报告,重点分析了Claude.ai在Claude 3.7 Sonnet模型发布后的使用情况。报告揭示了编码、教育、科学和医疗保健等领域的使用量增加,以及“扩展思考”模式主要用于技术性任务。此外,报告还提供了关于任务和职业层面的增强/自动化细分数据,以及首个自下而上的Claude.ai使用分类,为研究人员提供了宝贵的数据资源。

💡自从Claude 3.7 Sonnet发布以来,编码、教育、科学和医疗保健等职业类别的使用量有所增加。这可能反映了AI在经济中的持续渗透,也可能与模型在这些领域的新应用或能力提升有关。

🤔“扩展思考”模式主要应用于技术和创造性问题解决,其中计算机和信息研究科学家使用比例最高,其次是软件开发人员。数字创意角色(如多媒体艺术家和视频游戏设计师)也显示出显著的使用。

✍️报告分析了增强和自动化在不同任务和职业中的差异。例如,文案撰写人和编辑相关的任务显示出最高的任务迭代,而翻译和口译相关的任务则表现出最高的指令性行为。

📊为了更深入地了解用户行为,研究发布了一个新的自下而上的Claude.ai使用分类数据集,该数据集包含630个细粒度类别,从“帮助解决家庭管道、水和维护问题”到“提供关于电池技术和充电系统的指导”等。

Last month, we launched the Anthropic Economic Index—a new initiative where we’re regularly releasing data and research aimed at understanding AI's effects on labor markets and the economy over time.

Today, we’re releasing our second research report from the Index, covering usage data on Claude.ai following the launch of Claude 3.7 Sonnet—our newest and most capable model with strengths in agentic coding and a new “extended thinking” mode.

Briefly, our latest results are the following:

    Since the launch of Claude 3.7 Sonnet, we’ve observed a rise in the share of usage for coding, as well as educational, science, and healthcare applications;People use Claude 3.7 Sonnet’s new “extended thinking” mode predominantly for technical tasks, including those associated with occupations like computer science researchers, software developers, multimedia animators, and video game designers;We're releasing data on augmentation / automation breakdowns on a task- and occupation-level. For example, tasks associated with copywriters and editors show the highest amount of task iteration, where the human and model co-write something together. By contrast, tasks associated with translators and interpreters show among the highest amounts of directive behavior—where the model completes the task with minimal human involvement.

In addition, we’re releasing a first-of-its-kind bottom-up taxonomy of usage on Claude.ai. This new dataset covers 630 granular categories ranging from “Help resolve household plumbing, water, and maintenance issues” to “Provide guidance on battery technologies and charging systems.” We hope this bottom-up taxonomy will be useful for researchers, and reveal use-cases that might be missed by top-down approaches which map usage onto a list of predefined tasks.

The datasets for these analyses are freely available to download.

Read on for more details on our findings.

What’s changed since the launch of Claude 3.7 Sonnet?

Last month, we introduced Claude 3.7 Sonnet, our most capable model yet with an “extended thinking mode”. We reran our previous analysis on data from the 11 days following the launch, covering 1 million anonymized Claude.ai Free and Pro conversations. The vast majority of the data we analyzed was from Claude 3.7 Sonnet, as it is set as the default on Claude.ai and our mobile app.

As a reminder, our privacy-preserving analysis tool, Clio, maps each conversation to one of 17,000 tasks in the U.S. Department of Labor’s O*NET database. We then look at the overall patterns in the occupations and high-level occupational categories associated with those tasks.

When looking at the breakdown of these 1 million conversations, we see that the proportion of usage in several occupational categories has increased modestly, including coding, education and the sciences. While this increase in coding usage was expected due to Claude 3.7 Sonnet’s improved scores on coding benchmarks, the increase in these other categories could reflect either ongoing diffusion of AI throughout the economy, novel applications of coding to those domains, or unexpected capability improvements in the model.

In the two months since our original data sample, we’ve seen an increase in the share of usage for coding, education, and the sciences. Graph shows share of Claude.ai Free and Pro traffic across top-level occupational categories in O*NET. Grey shows the distribution from our first report covering data from Dec ‘25 - Jan ‘25. Colored bars show an increase (green) and decrease (blue) in the share of usage for our new data from Feb ‘25 - March ‘25. Note that the graph shows the share of usage rather than absolute usage. See Appendix for chart showing change across the full list of occupational categories.


How are people using extended thinking mode?

Claude 3.7 Sonnet features a new “extended thinking” mode which, when activated by the user, enables the model to think for longer when answering more complex questions.

Our analysis reveals that Claude 3.7 Sonnet's extended thinking mode is predominantly used in technical and creative problem-solving contexts. Tasks associated with computer and information research scientists lead with almost 10% using extended thinking, followed by software developers at around 8%. Tasks associated with digital creative roles like multimedia artists (~7%) and video game designers (~6%) also show substantial usage.

While these early usage patterns reveal insights about when people choose to use extended thinking mode, many important questions remain about this new model capability. To enable further research in this space, we’re releasing a new dataset that maps each O*NET task to its associated thinking mode fraction. This dataset is available on our Hugging Face page.

What tasks see the highest associated usage of extended thinking mode? Graph shows the O*NET occupations with highest usage of thinking mode in their associated tasks. Occupations shown are limited to those with at least 0.5% representation in the data.
What tasks see the highest associated usage of extended thinking mode? Graph shows the O*NET occupations with highest usage of thinking mode in their associated tasks. Occupations shown are limited to those with at least 0.5% representation in the data.

How does augmentation vs. automation vary by task and occupation?

In our last report, we analyzed how AI usage varied between augmentative uses, like learning or iterating on an output, and automative uses, like asking the model to directly complete a task or debug errors. Our analysis shows the balance of augmentation and automation is essentially unchanged in our new data, with augmentation still comprising 57% of usage. However, we did see some change in types of automation and augmentation uses—for example, we saw learning interactions, where the user asks Claude for information or explanation about different topics, rise from ~23% to ~28%.

The balance of augmentation and automation has stayed relatively constant in the two months between our data samples (V1 and V2), though the share of Learning conversations has grown appreciably.
The balance of augmentation and automation has stayed relatively constant in the two months between our data samples (V1 and V2), though the share of Learning conversations has grown appreciably.


We received a number of requests via our researcher input form to release automation and augmentation data at the level of tasks and occupations. We do just that in this report, providing this data on our Hugging Face page.

When splitting the data by high-level occupational categories, we find some categories are highly augmentative; for example, Community and Social Service tasks, which includes education and guidance counseling, approach 75% augmentation. At the other end of the extreme, tasks associated with production or computer and mathematical occupations, we see the balance skew closer to 50-50%. We don’t see any occupational categories where automation dominates.

 Proportion of different interaction modes across high-level occupational categories. Occupational categories shown are limited to those with at least 0.5% representation in the data.
Proportion of different interaction modes across high-level occupational categories. Occupational categories shown are limited to those with at least 0.5% representation in the data.


Getting more granular, we can also look at specific occupations within these occupational categories, as well as tasks associated with that occupation. For example, tasks associated with copywriters and editors show the highest amount of task iteration, where the user iterates on various writing and editing tasks with the model. By contrast, tasks associated with Translators and Interpreters show among the highest amounts of directive behavior—where the model is used for translating documents with minimal human involvement. Note that the O*NET descriptions may not be optimally representative of what Claude is being used for—for example, while we see usage in the occupation “fine artists, including painters, sculptors, and illustrators,” Claude is probably used far more for creating digital art than for painting or sculpture.

Top occupations by interaction type. For each of the five interaction categories (Learning, Task Iteration, Validation, Directive, and Feedback Loop), we plot the occupations with the highest usage proportion within that category. For example, librarians show the highest proportion of learning interactions at ~56%, while copy writers lead in Task Iteration at ~58%. Each panel includes the O*NET task within the occupation that contributed most strongly to that interaction pattern; this is based on both how frequently the task occurs and how often that interaction mode is used within the task. Figures for the other learning modes are shown in the Appendix.Note that the O*NET descriptions may not be optimally representative of what Claude is being used for—for example, while we see usage in the occupation “fine artists, including painters, sculptors, and illustrators,” usage on Claude.ai probably tilts more towards digital art than sculpture. Only occupations with at least 0.5% representation in the overall dataset are shown.

A bottom-up taxonomy of usage on Claude.ai

Our research so far has relied on the O*NET dataset of tasks and occupations, which was created and maintained by the US Department of Labor. While O*NET covers a very large number of tasks, O*NET may not be the best taxonomy to describe the capabilities of general purpose models which can be used for tasks which are not present in O*NET—and thus might be missed by our analysis.

To address this gap, we’re releasing a new bottom-up dataset of user activity patterns on Claude.ai. This dataset was also created with Clio, and uses the same dataset of anonymized conversations used for the above analysis, meaning that it enables comparisons between top-down and bottom-up approaches. It consists of 630 granular clusters, with associated descriptions, prevalence metrics, and automation/augmentation breakdowns, organized into three levels of hierarchy.

While we leave detailed analysis of this dataset to future work, we highlight a few particularly interesting clusters:

    Help with water management systems and infrastructure projectsCreate physics-based simulations with interactive visualization capabilitiesHelp me with font selection, implementation, and troubleshootingHelp me create or improve job application materialsProvide guidance on battery technologies and charging systemsHelp with time zone handling in code and databases

Conclusion

As models continue to advance, so too must our measurement of their economic impacts. In our second report, covering data since the launch of Claude 3.7 Sonnet, we find relatively modest increases in coding, education, and scientific use cases, and no change in the balance of augmentation and automation. We find that Claude’s new extended thinking mode is used with the highest frequency in technical domains and tasks, and identify patterns in automation / augmentation patterns across tasks and occupations. We release datasets for both of these analyses.

In the coming months, we aim to continue tracking these metrics and developing new ones as capabilities improve and models continue to be applied across the economy.

Work with us

If you’re interested in working at Anthropic to research the effects of AI on the labor market, we encourage you to apply for our Societal Impacts Research Scientist and Research Engineer roles, as well as our Economist role.

Appendix

We share a few additional results and technical details in this appendix.

Task Curve

We also recompute the “depth of task usage” plot from our original paper. We find generally a very similar curve to our first analysis. If anything we see slightly less area under the curve for the newer model—perhaps owing to an increase in concentration in our sample of conversations towards coding. That said, while we haven’t seen a dramatic change in this curve over the last two months, we will continue to monitor as model capabilities and product surfaces continue to advance.

The depth of task usage across occupations. For example, the graph shows that about 40% of occupations see AI usage in at least 20% of their tasks (where x=0.2, y≈0.4). There is little change in the curves between our first and second reports.
The depth of task usage across occupations. For example, the graph shows that about 40% of occupations see AI usage in at least 20% of their tasks (where x=0.2, y≈0.4). There is little change in the curves between our first and second reports.

Full change across occupational categories

Percentage share of usage across occupational categories, showing values from our original report (gray bars) with corresponding increases (yellow) and decreases (blue) in the second report. computer and mathematical occupations represent the category with the largest absolute increase (+3%), while several categories like education and the sciences show notable percentage increases.


Results for other interaction modes


Top occupations by interaction type. For each of the five interaction categories (Learning, Task Iteration, Validation, Directive, and Feedback Loop), we plot the occupations with the highest usage proportion within that category. For example, librarians show the highest proportion of learning interactions at ~56%, while copy writers lead in Task Iteration at ~58%. Each panel includes the O*NET task within the occupation that contributed most strongly to that interaction pattern; this is based on both how frequently the task occurs and how often that interaction mode is used within the task. Figures for the other learning modes are shown in the main body of the post.Note that the O*NET descriptions may not be optimally representative of what Claude is being used for—for example, while we see usage in the occupation “fine artists, including painters, sculptors, and illustrators,” usage on Claude.ai probably tilts more towards digital art than sculpture. Only occupations with at least 0.5% representation in the overall dataset are shown.

Additional methodological details

While we mainly follow the methodology of our original report, we make a few changes which we document here for transparency:

    In contrast to our last report, we do not filter based on whether conversations are relevant to an occupational category. Instead, we simply filter out conversations that flagged our safety classifiers. We find these approaches lead to similar results as our original analysis, while preserving more data that we can release via our bottom-up taxonomy of usage.We use Claude 3.7 Sonnet in place of all cases where we previously used Claude 3.5 Sonnet. We found that using our newer model increased the accuracy of classifications according to the internal benchmarks we use to assess Clio’s accuracy.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Claude.ai 人工智能 劳动力市场 自动化 增强
相关文章