Society's Backend 前天 23:58
Everything developers should know about from Google I/O 2025
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

在 Google I/O 2025 大会上,人工智能(AI)成为了绝对的焦点。谷歌发布了大量关于 Gemini、AI Studio 和 Gemini API 的更新,旨在简化 AI 技术的应用,提升开发者的效率。其中,Veo 3 具备原生音频生成能力,Flow 是一款 AI 驱动的电影制作工具,Gemma 3n 是一款可在本地运行的开源模型。此外,Gemini 2.5 Flash 增强了透明度,并推出了新的 Gemma 变体,以及 Gemini Diffusion 和 AI 辅助代码编辑工具。这些更新预示着 AI 技术在视频生成、模型部署、代码编写等多个领域的广泛应用和发展。

🎬 Veo 3 带来了视频生成领域的突破,它首次实现了视频和音频的同步生成。这意味着用户只需一个提示词,即可创建带有配乐的完整视频,极大地简化了 AI 视频创作流程。

💻 谷歌推出了 Flow,一款 AI 驱动的电影制作工具。虽然目前仍处于初步阶段,但 Flow 旨在通过 AI 技术,帮助用户提升视频创作的效率和质量。同时,Google AI Ultra 计划每月收费 250 美元,主要面向企业用户。

📱 Gemma 3n 是一款专为移动设备设计的开源模型,仅需 2GB 内存即可在设备上运行,性能与 Gemini Nano 相当。此外,Gemma 还推出了 MedGemma 和 SignGemma 两个变体,分别用于医学图像和文本理解,以及手语翻译,这增加了模型的实用性和可访问性。

🚀 Gemini Diffusion 实现了代码编写速度 5 倍的提升,且保持了编码性能。同时,Google Colab 将引入完全 agentic 的体验,并集成了 AI 辅助代码编辑功能,进一步提升了开发者的编码效率。

🌐 Gemini API 新增了计算机使用 API 和 URL 上下文工具,使开发者能够构建可浏览网页的应用,并提取网页内容。此外,Gemini 还支持 MCP(模型上下文协议),这有助于提升开发者体验,并促进 Gemini 与更多开源工具的集成。

Google I/O 2025 was basically an AI-fest. There were a ton of updates to Gemini, AI Studio, and the Gemini API that make building with AI much easier for you.

Here’s the list of announcements you should be aware of.

If you’re new to Society’s Backend, subscribe to get machine learning engineering articles in your inbox each week.

Veo 3 can generate videos with audio

There were a lot of interesting things announced at Google I/O. First, Veo 3 is the first video generation model with native audio generation. This means one prompt to create a video with accompanying audio.

This has been a barrier for applications of AI video generation. Veo 2 was really only good for B-Roll or social media posts due its duration limitation and lack of native audio. With Veo 3, I wouldn’t be surprised if we see high-quality feature films entirely generated with AI within the next year.

Flow: An AI-powered filmmaking tool

The big blocker now is consistency between generated videos, but it seems Google is solving that because they’ve released Flow, an AI-powered filmmaking product. It’s gimmicky in its current form but will definitely get better with time.

Google AI Ultra is $250/mo

You might have to pay quite a bit to use Flow’s full feature set, though, because Flow is part of Google’s Google AI Ultra plan priced at $250/mo. This plan is clearly targeted toward businesses as no consumer could afford that payment, but many Gemini fans are upset at the high price tag.

Gemma 3n is an open model that can run locally with just 2GB of RAM

Google I/O also saw the announcement of Gemma 3n, Google’s state-of-the-art mobile-first open model. It only requires only 2GB of RAM to run on-device with only 4B active parameters and is similar in performance to Gemini Nano. It’s already available for developers to start building via Google’s Gen AI SDK.

This is huge: on-device LLMs are the future once performance for smaller models can be improved. The vast majority of users don’t need anything more powerful than an on-device model, so they shouldn’t have to pay for anything better or send their data off-device to have access to an LLM.

Gemini 2.5 Flash is more transparent

Gemini 2.5 Flash is now the default model in the Gemini app and thought summaries have been added for greater transparency.

One of the chief complains of the Gemini App is the amount of thinking that is done without being shared with the user. Previously, users have noticed (with all LLMs—not just Gemini) that thinking models tend to go in circles or their thoughts that arrive at their conclusion don’t make sense. Transparency fixes the complaint of users not knowing when this happens.

Gemma gets two new variants

Gemma has two more variants: MedGemma for medical image and text comprehension and SignGemma (I couldn’t find a link for this, so if you have one let me know!) for translating sign language into spoken language.

Open models for both of these purposes are very important. This increases both transparency and accessibility.

Gemini Diffusion makes coding with AI 5x faster

Gemini Diffusion is a text diffusion model that is 5x as fast as Flash while matching coding performance. You have to sign up to get access. It seems this is the direction Google is going for coding models. It makes a ton of sense.

If you’ve used Cursor, you know how slow the agentic mode with Gemini 2.5 Pro can get with large projects or even a single large text file. Making this much faster will make AI-assisted coding a much better experience.

Google Colab is going agentic

Google Colab will soon have a fully agentic experience.

I’ve wondered when Google will get into the AI code editor competition and Colab seems to be the logical entry point. I use an AI-assisted editor to code each day during my job at Google but Google currently doesn’t offer it externally. I’m guessing this will be extended to a fully outward-facing AI editor in the future especially given Google’s focus on creating better coding models (see section above).

30 thing you can build with Gemini

Google Code Assist and Code Review now powered by Gemini 2.5 Pro

Google Code Assist is now powered by Gemini 2.5 Pro with a 2 million token context window. Both Code Assist and Gemini Code Assist for GitHub (a code review assistant) are now generally available to everyone.

As far as I know both these integrate with GitHub repos and are free.

Jules is an AI programmer that will code alongside you

Jules is a code assistant that codes alongside you by tackling your backlog while you work on code you want to. You can try it now!

Jules integrates directly with GitHub, clones your code to a cloud VM, and gives your a PR to review and submit based on its changes. This reminds of Devin, but it seems like it’s a more fleshed out experience. If you try it out, let me know how it goes!

Generative AI models have been added to AI Studio

Generative media models have been added to Google’s AI Studio making it easier to prototype with those models. Gemini 2.5 Pro has also been integrated into AI Studio’s code editor to enable faster prototyping.

Gemini is introducing a computer use API

A computer use API has been added to the Gemini API (frustrating that I can’t find a link to this—let me know if you do and I’ll add it here). This tool enables developers to build apps that can browse the web. It’s currently only available to trusted testers but will role out further later this year.

Gemini gets access to URL context

A URL context tool has been added to the API allowing Gemini to extract the content of a webpage at a URL. This is something I’ve desperately been waiting for because it seems like such a no-brainer for an LLM offering from the world’s greatest Search company.

Gemini supports MCP

The Gemini API and SDK will support MCP (Model Context Protocol—check out this article if you want to know what it is). Standards like MCP (when implemented properly) make developer easier for engineers. This means Gemini will support and integrate with many more open source tools.


For all updates for developers, check out Google’s blog post here.

Thanks for reading!

Always be (machine) learning,

Logan

Share

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Google I/O 人工智能 Gemini AI 开发
相关文章