ThursdAI - Recaps of the most high signal AI weekly spaces 2024年10月22日
? ? ThursdAI - Sep 12 - OpenAI's ? is called 01 and is HERE, reflecting on Reflection 70B, Google's new auto podcasts & more AI news from last week
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本周AI领域发生了许多激动人心的事件,OpenAI发布了推理模型o1系列,谷歌推出了音频摘要工具NotebookLM,Mistral AI发布了多模态模型Pixtral,以及DeepSeek 2.5的发布。OpenAI的o1模型旨在模仿人类的“系统2”思维,通过强化学习和特殊思维令牌,在逻辑推理方面取得了显著进展,特别是在竞赛数学和代码问题上。谷歌的NotebookLM则可以将PDF、网页链接等内容转化为音频播客,方便用户进行学习和理解。Mistral AI发布的Pixtral模型可以处理多个图像和文本,并提供开源许可,方便社区进行研究和应用。DeepSeek 2.5则将DeepSeek Coder整合到模型中,并取得了显著的性能提升。

🤩 OpenAI发布了推理模型o1系列,包括o1-preview和o1-mini,这些模型旨在模仿人类的“系统2”思维,通过强化学习和特殊思维令牌,在逻辑推理方面取得了显著进展。 这些模型在竞赛数学和代码问题等需要大量推理的任务中表现出色,展现了其在逻辑推理方面的强大能力。OpenAI将这种新的推理能力称为“新的扩展范式”,并表示这种“推理时间计算”的改变将改变AI模型的训练方式,使得模型能够在推理阶段进行更长时间的思考,从而提高其推理能力。

🤖 谷歌推出了音频摘要工具NotebookLM,可以将PDF、网页链接等内容转化为音频播客,方便用户进行学习和理解。 NotebookLM能够将多达50个来源(包括PDF、网页链接、文档等)转换为音频播客,用户可以与这些音频进行对话,创建学习指南,深入研究并添加笔记。这一功能对于那些不喜欢阅读的人来说非常实用,可以帮助他们更轻松地获取信息。

🖼️ Mistral AI发布了多模态模型Pixtral,可以处理多个图像和文本,并提供开源许可,方便社区进行研究和应用。 Pixtral采用了一种独特的架构,可以处理多个图像和文本,并在128k的上下文窗口中进行处理,这与大多数开源多模态模型只能处理单个图像的能力相比有了显著的提升。Pixtral的发布引起了社区的热烈反响,其开源许可也方便了更多研究人员和开发者使用和改进该模型。

🚀 DeepSeek 2.5发布,将DeepSeek Coder整合到模型中,并取得了显著的性能提升。 DeepSeek 2.5在HumanEval和MT-bench等基准测试中取得了优异的成绩,展现了其强大的代码生成能力。DeepSeek 2.5的发布标志着代码生成模型的进一步发展,为开发者提供了更加强大的工具。

🍎 苹果发布了新的“点选描述”功能,利用AI技术为iPhone用户提供更智能的语音助手体验。 这一功能可以根据用户选择的图片或文本生成更详细的描述,并与苹果的智能助手Siri进行整合,为用户提供更便捷的交互体验。

📊 谷歌发布了DataGemma 27B模型,用于提高检索-交织生成 (RIG/RAG) 的结果。 DataGemma模型旨在通过与谷歌的数据公共库进行连接,为LLM提供更丰富的数据源,从而提高LLM的检索-交织生成能力。这一发布标志着谷歌在提高LLM性能方面取得了新的进展。

🤔 OpenAI发布的o1模型引发了社区对于AI模型推理能力的关注,也让人们开始思考未来AI模型的发展方向。同时,谷歌的NotebookLM和DataGemma 27B的发布则表明了大型语言模型在不同领域的应用潜力,以及未来AI技术将如何改变我们的生活和工作方式。

😎 本周的AI领域充满了惊喜和挑战,让我们拭目以待未来AI技术将如何发展,以及它将如何改变我们的世界。

🤩 OpenAI的o1模型在推理方面的进步令人振奋,但同时也引发了一些安全方面的担忧。OpenAI正在努力解决这些问题,并计划发布更多新的模型,以进一步提升AI模型的性能和安全性。

🤖 谷歌的NotebookLM和Illuminate等工具为用户提供了更便捷的学习和内容创作方式,也让人们看到了AI技术在教育和娱乐领域的巨大潜力。

🖼️ Mistral AI的Pixtral模型的开源发布为社区提供了更多研究和应用的机会,也推动了多模态模型的发展。

🚀 DeepSeek 2.5的发布表明了代码生成模型的不断发展,未来我们可以期待看到更多更强大的代码生成工具出现。

🍎 苹果的“点选描述”功能为用户提供了更智能的语音助手体验,也展现了AI技术在智能手机等移动设备上的应用潜力。

📊 谷歌的DataGemma 27B模型的发布表明了谷歌在提高LLM性能方面取得了新的进展,也为未来LLM的发展提供了新的方向。

🤔 本周的AI领域充满了惊喜和挑战,让我们拭目以待未来AI技术将如何发展,以及它将如何改变我们的世界。

March 14th, 2023 was the day ThursdAI was born, it was also the day OpenAI released GPT-4, and I jumped into a Twitter space and started chaotically reacting together with other folks about what a new release of a paradigm shifting model from OpenAI means, what are the details, the new capabilities. Today, it happened again!

Hey, it's Alex, I'm back from my mini vacation (pic after the signature) and boy am I glad I decided to not miss September 12th! The long rumored ? thinking model from OpenAI, dropped as breaking news in the middle of ThursdAI live show, giving us plenty of time to react live!

But before this, we already had an amazing show with some great guests! Devendra Chaplot from Mistral came on and talked about their newly torrented (yeah they did that again) Pixtral VLM, their first multi modal! , and then I had the honor to host Steven Johnson and Raiza Martin from NotebookLM team at Google Labs which shipped something so uncannily good, that I legit said "holy fu*k" on X in a reaction!

So let's get into it (TL;DR and links will be at the end of this newsletter)

OpenAI o1, o1 preview and o1-mini, a series of new "reasoning" models

This is it folks, the strawberries have bloomed, and we finally get to taste them. OpenAI has released (without a waitlist, 100% rollout!) o1-preview and o1-mini models to chatGPT and API (tho only for tier-5 customers) ? and are working on releasing 01 as well.

These are models that think before they speak, and have been trained to imitate "system 2" thinking, and integrate chain-of-thought reasoning internally, using Reinforcement Learning and special thinking tokens, which allows them to actually review what they are about to say before they are saying it, achieving remarkable results on logic based questions.

Specifically you can see the jumps in the very very hard things like competition math and competition code, because those usually require a lot of reasoning, which is what these models were trained to do well.

New scaling paradigm

Noam Brown from OpenAI calls this a "new scaling paradigm" and Dr Jim Fan explains why, with this new way of "reasoning", the longer the model thinks - the better it does on reasoning tasks, they call this "test-time compute" or "inference-time compute" as opposed to compute that was used to train the model. This shifting of computation down to inference time is the essence of the paradigm shift, as in, pre-training can be very limiting computationally as the models scale in size of parameters, they can only go so big until you have to start building out a huge new supercluster of GPUs to host the next training run (Remember Elon's Colossus from last week?).

The interesting thing to consider here is, while current "thinking" times are ranging between a few seconds to a minute, imagine giving this model hours, days, weeks to think about new drug problems, physics problems ?.

Prompting o1

Interestingly, a new prompting paradigm has also been introduced. These models now have CoT (think "step by step") built-in, so you no longer have to include it in your prompts. By simply switching to o1-mini, most users will see better results right off the bat. OpenAI has worked with the Devin team to test drive these models, and these folks found that asking the new models to just give the final answer often works better and avoids redundancy in instructions.

The community of course will learn what works and doesn't in the next few hours, days, weeks, which is why we got 01-preview and not the actual (much better) o1.

Safety implications and future plans

According to Greg Brokman, this inference time compute also greatly helps with aligning the model to policies, giving it time to think about policies at length, and improving security and jailbreak preventions, not only logic.

The folks at OpenAI are so proud of all of the above that they have decided to restart the count and call this series o1, but they did mention that they are going to release GPT series models as well, adding to the confusing marketing around their models.

Open Source LLMs

Reflecting on Reflection 70B

Last week, Reflection 70B was supposed to launch live on the ThursdAI show, and while it didn't happen live, I did add it in post editing, and sent the newsletter, and packed my bag, and flew for my vacation. I got many DMs since then, and at some point couldn't resist checking and what I saw was complete chaos, and despite this, I tried to disconnect still until last night.

So here's what I could gather since last night. The claims of a llama 3.1 70B finetune that Matt Shumer and Sahil Chaudhary from Glaive beating Sonnet 3.5 are proven false, nobody was able to reproduce those evals they posted and boasted about, which is a damn shame.

Not only that, multiple trusted folks from our community, like Kyle Corbitt, Alex Atallah have reached out to Matt in to try to and get to the bottom of how such a thing would happen, and how claims like these could have been made in good faith. (or was there foul play)

The core idea of something like Reflection is actually very interesting, but alas, the inability to replicate, but also to stop engaging with he community openly (I've reached out to Matt and given him the opportunity to come to the show and address the topic, he did not reply), keep the model on hugging face where it's still trending, claiming to be the world's number 1 open source model, all these smell really bad, despite multiple efforts on out part to give the benefit of the doubt here.

As for my part in building the hype on this (last week's issues till claims that this model is top open source model), I addressed it in the beginning of the show, but then twitter spaces crashed, but unfortunately as much as I'd like to be able to personally check every thing I cover, I often have to rely on the reputation of my sources, which is easier with established big companies, and this time this approached failed me.

This weeks Buzzzzzz - One last week till our hackathon!

Look at this point, if you read this newsletter and don't know about our hackathon, then I really didn't do my job prompting it, but it's coming up, September 21-22 ! Join us, it's going to be a LOT of fun!

?️ Pixtral 12B from Mistral

Mistral AI burst onto the scene with Pixtral, their first multimodal model! Devendra Chaplot, research scientist at Mistral, joined ThursdAI to explain their unique approach, ditching fixed image resolutions and training a vision encoder from scratch.

"We designed this from the ground up to...get the most value per flop," Devendra explained. Pixtral handles multiple images interleaved with text within a 128k context window - a far cry from the single-image capabilities of most open-source multimodal models. And to make the community erupt in thunderous applause (cue the clap emojis!) they released the 12 billion parameter model under the ultra-permissive Apache 2.0 license. You can give Pixtral a whirl on Hyperbolic, HuggingFace, or directly through Mistral.

DeepSeek 2.5: When Intelligence Per Watt is King

Deepseek 2.5 launched amid the reflection news and did NOT get the deserved attention it.... deserves. It folded (no deprecated) Deepseek Coder into 2.5 and shows incredible metrics and a truly next-gen architecture. "It's like a higher order MOE", Nisten revealed, "which has this whole like pile of brain and it just like picks every time, from that." ?. DeepSeek 2.5 achieves maximum "intelligence per active parameter"

Google's turning text into AI podcast for auditory learners with Audio Overviews

Today I had the awesome pleasure of chatting with Steven Johnson and Raiza Martin from the NotebookLM team at Google Labs. NotebookLM is a research tool, that if you haven't used, you should definitely give it a spin, and this week they launched something I saw in preview and was looking forward to checking out and honestly was jaw-droppingly impressed today.

NotebookLM allows you to upload up to 50 "sources" which can be PDFs, web links that they will scrape for you, documents etc' (no multimodality so far) and will allow you to chat with them, create study guides, dive deeper and add notes as you study.

This week's update allows someone who doesn't like reading, to turn all those sources into a legit 5-10 minute podcast, and that sounds so realistic, that I was honestly blown away. I uploaded a documentation of fastHTML in there.. and well hear for yourself

The conversation with Steven and Raiza was really fun, podcast definitely give it a listen!

Not to mention that Google released (under waitlist) another podcast creating tool called illuminate, that will convert ArXiv papers into similar sounding very realistic 6-10 minute podcasts!

ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.


There are many more updates from this week, there was a whole Apple keynote I missed, which had a new point and describe feature with AI on the new iPhones and Apple Intelligence, Google also released new DataGemma 27B, and more things in TL'DR which are posted here in raw format

See you next week ? Thank you for being a subscriber, weeks like this are the reason we keep doing this! ? Hope you enjoy these models, leave in comments what you think about them


TL;DR in raw format

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI OpenAI o1 推理模型 谷歌 NotebookLM 音频摘要 Mistral AI Pixtral 多模态模型 DeepSeek 2.5 代码生成 苹果 点选描述 DataGemma 检索-交织生成
相关文章