The Agent Reasoning Interface: o1/o3, Claude 3, ChatGPT Canvas, Tasks, and Operator

Sponsorships and tickets for the AI Engineer Summit are selling fast! See the new website with speakers and schedules live!

If you are building AI agents or leading teams of AI Engineers, this will be the single highest-signal conference of the year for you, this Feb 20-22nd in NYC.

We’re pleased to share that Karina will be presenting OpenAI’s closing keynote at the AI Engineer Summit. We were fortunate to get some time with her today to introduce some of her work, and hope this serves as nice background for her talk!

There are very few early AI careers that have been as impactful as Karina Nguyen’s. After stints at Notion, Square, Dropbox, Primer, the New York Times, and UC Berkeley, She joined Anthropic as employee ~60 and worked on a wide range of research/product roles for Claude 1, 2, and 3. We’ll just let her LinkedIn speak for itself:

Now, as Research manager and Post-training lead in Model Behavior at OpenAI, she creates new interaction paradigms for reasoning interfaces and capabilities, like ChatGPT Canvas, Tasks, SimpleQA, streaming chain-of-thought for o1 models, and more via novel synthetic model training.

Ideal AI Research+Product Process

In the podcast we got a sense of what Karina has found works for her and her team to be as productive as they have been:

Write PRD (Define what you want)

Funding (Get resources)

Prototype Prompted Baseline (See what’s possible)

Write and Run Evals (Get failures to hillclimb)

Model training (Exceed baseline without overfitting)

Bugbash (Find bugs and solve them)

Ship (Get users!)

We could turn this into a snazzy viral graphic but really this is all it is. Simple to say, difficult to do well. Hopefully it helps you define your process if you do similar product-research work.

Show Notes

Our Reasoning Price War post

Karina LinkedIn, Website, Twitter

OSINT visualization work

Ukraine 3D storytelling

Karina on Claude Artifacts

Karina on Claude 3 Benchmarks

Inspiration for Artifacts / Canvas from early UX work she did on GPT-3

“i really believe that things like canvas and tasks should and could have happened like 2 yrs ago, idk why we are lagging in the form factors” (tweet)

Our article on prompting o1 vs Karina’s Claude prompting principles

Canvas: https://openai.com/index/introducing-canvas/

We trained GPT-4o to collaborate as a creative partner. The model knows when to open a canvas, make targeted edits, and fully rewrite. It also understands broader context to provide precise feedback and suggestions.

To support this, our research team developed the following core behaviors:

Triggering the canvas for writing and coding

Generating diverse content types

Making targeted edits

Rewriting documents

Providing inline critique

We measured progress with over 20 automated internal evaluations. We used novel synthetic data generation techniques, such as distilling outputs from OpenAI o1-preview, to post-train the model for its core behaviors. This approach allowed us to rapidly address writing quality and new user interactions, all without relying on human-generated data.

Tasks: https://www.theverge.com/2025/1/14/24343528/openai-chatgpt-repeating-tasks-agent-ai

Agents and Operator

What are agents? “Agents are a gradual progression of tasks: starting with one-off actions, moving to collaboration, and ultimately fully trustworthy long-horizon delegation in complex envs like multi-player/multiagents.” (tweet)

tasks and canvas fall within the first two, and we are def. marching towards the third—though the form factor for 3 will take time to develop

Operator/Computer Use Agents

https://openai.com/index/introducing-operator/

Misc:

Andrew Ng

Prediction: Personal AI Consumer playbook

ChatGPT as generative OS

Timestamps

00:00 Welcome to the Latent Space Podcast

00:11 Introducing Karina Nguyen

02:21 Karina's Journey to OpenAI

04:45 Early Prototypes and Projects

05:25 Joining Anthropic and Early Work

07:16 Challenges and Innovations at Anthropic

11:30 Launching Claude 3

21:57 Behavioral Design and Model Personality

27:37 The Making of ChatGPT Canvas

34:34 Canvas Update and Initial Impressions

34:46 Differences Between Canvas and API Outputs

35:50 Core Use Cases of Canvas

36:35 Canvas as a Writing Partner

36:55 Canvas vs. Google Docs and Future Improvements

37:35 Canvas for Coding and Executing Code

38:50 Challenges in Developing Canvas

41:45 Introduction to Tasks

41:53 Developing and Iterating on Tasks

46:27 Future Vision for Tasks and Proactive Models

52:23 Computer Use Agents and Their Potential

01:00:21 Cultural Differences Between OpenAI and Anthropic

01:03:46 Call to Action and Final Thoughts

Ideal AI Research+Product Process

Show Notes

Timestamps

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签