ThursdAI - Recaps of the most high signal AI weekly spaces 前天 09:32
📆 ThursdAI – Jul 31, 2025 – Qwen’s Small Models Go Big, StepFun’s Multimodal Leap, GLM-4.5’s Chart Crimes, and Runway’s Mind‑Bending Video Edits + GPT-5 soon?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本周,开源AI领域迎来爆炸性增长,多家公司发布了多款前沿模型。Alibaba的Qwen系列推出三款新模型,包括强大的推理模型Qwen3-235B-Thinking-2507,以及适合本地部署的Qwen3-30B-Thinking-2507和Qwen3-Coder-Flash-2507。Zhipu(原Z.ai)发布了GLM 4.5,在推理能力上与Qwen系列不相上下。StepFun也推出了多模态SOTA模型Step3。在视频生成领域,Wan 2.2作为首个开源MoE视频模型,实现了高质量的文本到视频生成。Runway则通过Gen-3 Aleph模型,为AI视频编辑带来了对话交互式功能。AI艺术方面,Krea与Black Forest Labs合作推出Flux.1,Ideogram的Characters功能实现了高效的角色一致性生成,Tencent的Hunyuan3D World Model 1.0则开创了可探索3D世界的开源先河。此外,Riffusion的Chatable Studio Producer和CharmBracelet的Crush工具也为AI应用提供了更多可能性。

🚀 **Alibaba Qwen系列模型持续发力**:Alibaba在本周发布了三款新的Qwen模型,包括在推理任务上表现卓越的235B版本(Qwen3-235B-Thinking-2507),以及适合本地部署的30B版本(Qwen3-30B-Thinking-2507)和专门为编码设计的Qwen3-Coder-Flash-2507。这些模型在各项评估中均取得了领先地位,特别是Qwen3-235B-Thinking-2507在推理、长上下文处理等方面表现突出,为开源社区带来了强大的新选择。

💡 **中国公司引领开源AI浪潮**:本周的发布再次印证了中国企业在开源AI领域的领先地位。除了Alibaba的Qwen系列,Zhipu(原Z.ai)发布的GLM 4.5模型在推理能力上与Qwen系列形成有力竞争。StepFun推出的Step3模型,以其321B MoE架构和出色的多模态处理能力,在MMMUBenchmark上取得了74%的高分,展现了强大的多模态理解和生成能力。

🎬 **视频生成技术迈入新纪元**:开源AI在视频生成领域也取得了显著进展。Wan 2.2作为首个开源MoE视频生成模型,能够生成高质量的5秒720p视频,甚至可以在单块4090显卡上运行,极大地降低了视频生成的门槛。Runway的Gen-3 Aleph模型则通过引入对话功能,实现了对视频内容的深度编辑和转换,用户可以轻松进行人物增减、场景角度调整等操作,预示着个性化娱乐内容的未来。

🎨 **AI艺术与3D生成探索无限可能**:AI艺术领域同样亮点纷呈。Krea与Black Forest Labs合作发布的Flux.1模型,以其独特的艺术风格和高质量的图像生成能力,为文本到图像生成带来了新的活力。Ideogram的Characters功能实现了高效的角色一致性生成,让用户能够轻松创建具有统一风格的角色形象。Tencent的Hunyuan3D World Model 1.0更是开创性地推出了首个可探索的3D世界生成器,为游戏开发、VR内容创作等领域提供了强大的新工具。

🛠️ **开发工具与平台助力AI落地**:为了更好地支持开发者利用这些前沿模型,Weights & Biases(W&B)提供了高效的推理服务,其Qwen3-Thinking模型的推理成本仅为Claude Opus的零头,极具性价比。此外,CharmBracelet推出的Crush工具,作为VS Code的开源版本,为开发者提供了更灵活的开发环境,并支持与Sub-agents等功能集成,进一步推动了开源AI模型的应用。

Woohoo, we're almost done with July (my favorite month) and the Open Source AI decided to go out with some fireworks 🎉

Hey everyone, Alex here, writing this without my own personal superintelligence (more: later) and this week has been VERY BUSY with many new open source releases.

Just 1 hour before the show we already had 4 breaking news releases, a tiny Qwen3-coder, Cohere and StepFun both dropped multimodal SOTAs and our friends from Krea dropped a combined model with BFL called Flux[Krea] 👏

This is on top of a very very busy week, with Runway adding conversation to their video model Alpha, Zucks' superintelligence vision and a new SOTA open video model Wan 2.2. So let's dive straight into this (as always, all show notes and links are in the end)

ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Open Source LLMs & VLMs

Tons of new stuff here, I'll try to be brief but each one of these releases deserves a deeper dive for sure.

Alibaba is on 🔥 with 3 new Qwen models this week

Yes, this is very similar to last week, where they have also dropped 3 new SOTA models in a week, but, these are additional ones.

It seems that someone in Alibaba figured out that after splitting away from the hybrid models, they can now release each model separately and get a lot of attention per model!

Here's the timeline:

Lets start with the SOTA reasoner, the 235B(A22B)-2507 is absolutely the best reasoner among the open source models.

We've put the model on our inference service (at crazy prices $.10/$.10) and it's performing absolutely incredible on reasoning tasks.

It also jumped to the top OSS model on Artificial Analysis scores, EQBench, Long Context and more evals. It a really really good reasoning model!

Smaller Qwens for local use

Just a week ago, we've asked Junyang on our show, about smaller models that folks can run on their devices, and he avoided by saying "we're focusing on the larger models" and this week, they delivered not 1 but 2 smaller versions of the bigger models (perfect for Speculative Decoding if you can host the larger ones that is)

The most interesting one is the Qwen3-Coder-flash, which came out today, with very very impressive stats - and the ability to run locally with almost 80 tok/s on a macbook!

So for the last two weeks, we now have 3 Qwens (Instruct, Thinking, Coder) and 2 sizes for each (all three have a 30B/A3B version now for local use) 👏

Z.ai GLM and StepFun Step3

As we've said previously, Chinese companies completely dominate the open source AI field right now, and this week as saw yet another crazy testament to how stark the difference is!

We've seen a rebranded Zhipu (Z.ai previously THUDM) release their new GLM 4.5 - which gives Qwen3-thinking a run for it's money. Not quite at that level, but definitely very close. I personally didn't love the release esthetics, showing a blended eval score, which nobody can replicate feels a bit off.

We also talked about how StepFun has stepped in (sorry for the pun) with a new SOTA in multimodality, called Step3. It's a 321B MoE (with a huge 38B active param count) that achieves very significant multi modal scores (The benchmarks look incredible: 74% on MMMU, 64% on MathVision)

Big Companies APIs & LLMs

Well, we were definitely thinking we'll get GPT-5 or the Open Source AI model from OpenAI this week, but alas, the tea leaves readers were misled (or were being misleading). We 100% know that gpt-5 is coming as multiple screenshots were blurred and then deleted showing companies already testing it.

But it looks like August is going to be even hotter than July, with multiple sightings of anonymous testing models on Web Dev arena, like Zenith, Summit, Lobster and a new mystery model on OpenRouter called Zenith - that some claim are the different thinking modes of GPT-5 and the open source model?

Zuck shares vision for personalized superintelligence (Meta)

In a very "Nat Fridman" like post, Mark Zuckerberg finally shared the vision behind his latest push to assemble the most cracked AI engineers.

In his vision, Meta is the right place to provide each one with personalized superintelligence, enhancing individual abilities with user agency according to their own values. (as opposed to a centralized model, which feels like his shot across the bow for the other frontier labs)

A few highlights: Zuck leans heavily into the rise of personal devices on top of which humans will interact with this superintelligence, including AR glasses and a departure from a complete "let's open source everything" dogman of the past, now there will be a more deliberate considerations of what to open source.


This Week's Buzz: Putting Open Source to Work with W&B

With all these incredible new models, the biggest question is: how can you actually use them? I'm incredibly proud to say that the team at Weights & Biases had all three of the big new Qwen models—Thinking, Instruct, and Coder—live on W&B Inference on day one (link)

And our pricing is just unbeatable. Wolfram did a benchmark run that would have cost him $150 using Claude Opus. On W&B Inference with the Qwen3-Thinking model, it cost him 22 cents. That's not a typo. It's a game-changer for developers and researchers.

To make it even easier, a listener of the show, Olaf Geibig, posted a fantastic tutorial on how you can use our free credits and W&B Inference to power tools like Claude Code and VS Code using LiteLLM. It takes less than five minutes to set up and gives you access to state-of-the-art models for pennies. All you need to do is add this config to vllm and run claude (or vscode) through it!

Give our inference service a try here and follow our main account @weights_biases a follow as we often drop ways to get additional free credits when new models release


Vision & Video models

Wan2.2: Open-Source MoE Video Generation Model Launches (X, HF)

This is likely the best open source video model, but definitely the first MoE video model! It came out with text2video, image2video and a combined version.

With 5 second 720p videos, that can even be generator at home on a single 4090, this is definitely a step up in the quality of video models that are fully open source.

Runway changes the game again - Gen-3 Aleph model for AI video editing / transformation (X, X)

Look, there's simply no denying this, AI video has had an incredible year, from open source like Wan, to proprietary models with sounds like VEO3. And it's not surprising that we're seeing this trend, but it's definitely very exciting when we see an approach like Runway has, to editing.

This adds a chat to the model, and your ability to edit.. anything in the scene. Remove / Add people and environmental effects, see the same scene from a different angle and a lot more!

Expect personalized entertainment very soon!

AI Art & Diffusion & 3D

FLUX.1 Krea [dev] launches as a state-of-the-art open-weights text-to-image model (X, HuggingFace)

Black Forest Labs teamed with Krea AI for Flux.1 Krea [dev], an open-weights text-to-image model ditching the "AI gloss" for natural, distinctive vibes—think DALL-E 2's quirky grain without the saturation. It outperforms open peers and rivals pros in prefs, fully Flux-compatible for LoRAs/tools. Yam and I geeked over the aesthetics frontier; it's a flexible base for fine-tunes, available on Hugging Face with commercial options via FAL/Replicate. If you're tired of cookie-cutter outputs, this breathes fresh life into generations.

Ideogram Character launches: one-shot character consistency for everyone (X)

Ideogram's Characters feature lets you upload one pic for instant, consistent variants—free for all, with inpainting to swap into memes/art. My tests nailed expressions/scenes (me in cyberpunk? Spot-on), though not always photoreal. Wolfram praised the accuracy; it's a meme-maker's dream! and they give like 10 free ones so give it a go

Tencent Hunyuan3D World Model 1.0 launches as the first open-source, explorable 3D world generator (X, HF)

Tencent's Hunyuan3D World Model 1.0 is the first open-source generator of explorable 3D worlds from text/image—360° immersive, exportable meshes for games/modeling. ~33GB VRAM on complex scenes, but Wolfram called it a metaverse step; I wandered a demo scene, loving the potential despite edges. Integrate into CG pipelines? Game-changer for VR/creators.

Voice & Audio

Look I wasn't even mentioning this on the show, but it came across my feed just as I was about to wrap up ThursdAI, and it's really something. Riffusion joined forces producer and using FUZZ-2 they now have a fully Chatable studio producer, you can ask for.. anything you would ask in a studio!

Here's my first reaction, and it's really fun, I think they still are open with the invite code 'STUDIO'... I'm not afiliated with them at all!

Tools

Ok I promised some folks we'll add this in, Nisten went super viral last week with him using a new open source tool called Crush from CharmBracelet, which is an open version of VSCode and it looks awesome!

He gave a demo live on the show, including how to set it up to work, with subagents etc. If you're into vibe coding, and using the open source models, def. give Crush a try it's really flying and looks cool!


Phew, ok, we somehow were able to cover ALLL these releases this week, and we didn’t even have an interview!

Here’s the TL;DR and links to the folks who subscribed (I’m trying a new thing to promote subs on this newsletter) and see you in two weeks (next week is Wolframs turn again as I’m somewhere in Europe!)

ThursdAI - July 31st, 2025 - TL;DR

Read more

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Open Source AI LLMs VLMs AI Video Generation AI Art
相关文章