Society's Backend 2024年12月13日
Discussions around OpenAI's o1, Superhuman AI, When AI Should Be An App, and More
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文涵盖了本周AI领域的多项重要讨论,包括OpenAI新模型o1、Reflection 70B后续、AI沟通问题、超级人类预测AI等,还涉及一些产品发布与更新。

OpenAI发布o1系列模型,其工作方式及引发的讨论

Reflection 70B发布后续,相关事件的深入探讨

AI存在沟通问题,公司应帮助人们理解其用途

一款能超人类水平预测的AI及相关争议

Open Interpreter的正确决策,产品转型为app

A cool, truthful tweet I think is worth sharing

Here’s this week’s notable discussions that took place in the world of AI. I’m splitting the reading list into a separate article that will be sent out tomorrow. Top highlights are first and more in depth. Other important discussions and happenings are afterword. I’m trying out a new format, so let me know what you think! Link to last week’s updates/discussion points:

If you’re new to Society’s Backend, welcome! I write each week about important discussions in the world of AI, share my reading list, and explain important AI topics from an engineering perspective. Subscribe to get these articles directly in your inbox.

Subscribe now

Enjoy! Don’t forget to leave a comment to let me know what you think.

Top Highlights

OpenAI o1

OpenAI release a new family of models called o1. They’ve released both o1 and o1-mini. You can check out the system card here. They claim these models think before they respond and are the first models that can actually reason. I was a bit disappointed that they didn’t release enough information about the models to really dig into this, but luckily the AI community has there ways and there’s been a ton of discussion about what “reasoning” means in this context and whether or not it’s an accurate claim.

Essentially, how o1 works it gets a prompt and goes through a chain-of-thought that breaks down the prompt into smaller chunks so it can work through each. This creates longer response that are more “thought out” and safer. The general consensus is these models work differently from the LLMs we’re used to with some claiming a new paradigm of AI. Prompting also appears to work differently because the model breaks down the prompt itself meaning traditional prompting methods won’t affect results the way we’re used to.

I think the best way to understand o1 and what it means is to take in a bunch of different perspectives from AI experts and those testing the model. Some of my favorites are about what o1 means for inference-time scaling of models, testing o1 on ARC-AGI where it showed disappointing results, and a comment about what o1’s reasoning means. It’s also worth reading Nathan Lambert’s article on reverse engineering OpenAI’s o1.

A post from the announcer of Reflection 70B

Reflection 70B Follow up

A quick follow up on last week’s Reflection 70B release being possibly fraudulent. I wanted to share some threads that provide some much-needed context:

And a good post showing the only winners from fraudulent or mistaken claims are AI influencers who blow them up for clout. It’s an unfortunate reality the AI community has to deal with. If you want an article that details the saga, check out this article from Artificial Ignorance.

AI’s Biggest Problem is Communication

I created a post about how the biggest problem right now with AI is the inability to communicate how its beneficial to the target audience. For example, Apple’s demo showed someone taking a picture of someone else’s dog and using visual intelligence to figure out the breed. This was the example they decided to use to show consumers how they can use visual intelligence.

There’s a very clear problem here: no one would ever do this. No one would ask a stranger “Can I take picture of your dog?” Instead, they would just ask “What breed is your dog?” Sure, visual intelligence can be used here, but it’s not particularly useful. This isn’t just an Apple issue—most of the demoes I see (especially around consumer use cases) are pretty bad.

I don’t think anyone doubts the usefulness of AI, but I think AI companies are doing a poor job of helping people understand how they can use it. Let me know what you think about this. I’m very curious if you think the communication here is poor or if we truly haven’t found great use cases yet.

AI as Super-human Predictor

“Our bot performs better than experienced human forecasters and performs roughly the same as (and sometimes even better than) crowds of experienced forecasters; since crowds are for the most part superhuman, so is FiveThirtyNine.”

The Center for AI Safety released a demo of an AI that can predict the future at a superhuman level. This is essentially an AI that makes predictions with an accuracy on-par with a group of experienced human forecasters. The bot works by asking it a question about the future (i.e. Will Trump win the 2024 presidential election?) and it’ll give a prediction as an answer.

This immediately made me pause. LLMs are by definition prediction machines, but I couldn’t wrap my head around how we can make the definitive conclusion that the AI predicts at a superhuman level when its trained on predictions at the human level. This post very quickly got community noted for not being able to justify the claim and another X user used the model on a separate dataset where it showed much worse performance. Their thread including a list of issues they found with the model can be found here.

Links to the X announcement (including the system prompt) and technical paper.

From Open Interpreter’s blog post.

It should have been an app

“Compared to the engineering maturity of modern smartphones, we realized that the 01 Light’s hardware didn't offer enough value to justify manufacturing it.”

The O1 light is being discontinued, everyone is being refunded, and the Open Interpreter is being released as an app. A little background: the O1 Light was going to be a small gadget to let people interface with their computer via voice utilizing LLMs. It was cool, but the company realized everything could be done via an app—so they canceled it.

I’m including this because this is the way things like should be handled. As someone who builds technology centered around AI, there should have been a point in the development process of all of these products where they realized custom hardware wasn’t needed. Instead of using AI hype to sell products at the detriment of consumers (like some companies have), Open Interpreter made the right decision to discontinue building a separate hardware device. Not only that but they refunded everyone, launched a free app, open-sourced all their manufacturing materials, and still made an option of a cheap standalone device for those who want it.

The world of AI is ripe with opportunity to do the wrong thing and Open Interpreter did the right things. Read the X thread or blog post here to get all the details and the reasoning behind the decision.

Other Highlights

Job Skill Updates

I’ve added a section on job skills to the ML road map. I’ll keep this section updated with the rest of the map and I’ll continue to add resources to help you learn these skills.

My first impression is I’m surprised at how many AI-related jobs focus heavily on standard software engineering skills first and then AI skills later. Languages like Java, C++, and Rust are emphasized in a lot of listing. It makes me wonder if employers see ML skills as easier to teach workers on the job than software engineering skills. It’s also possible there are a lot of SWE-focused ML jobs than pure ML researcher positions that are skewing these results.


I’ll be sending out my AI reading list tomorrow. Paid subscribers will get access to the entire list of all articles, papers, and videos worth looking into. Thanks for your support! If you’re interested in getting the full reading list of supporting Society’s Backend, you can subscribe for just $1/mo.

Get 80% off for 1 year

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

OpenAI AI沟通 AI预测 Open Interpreter AI发展
相关文章