Society's Backend 18小时前
ML for SWEs 6: RAG Is Not Dead
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本周AI领域的主要进展包括Meta发布Llama 4系列模型,尽管基准测试表现出色,但实际应用中表现令人失望。同时,关于RAG(检索增强生成)是否过时的讨论再起,文章强调RAG的重要性依然存在。此外,Google发布了专为推理设计的TPU Ironwood,并在AI IDE和Gemini API方面有所更新。文章还推荐了一些关于Prompt Engineering、AI学习、AI产品评估以及AI风险管理等方面的精选文章。

🦙Meta发布Llama 4系列模型,包含Scout、Maverick和Behemoth三款模型。Scout和Maverick均有170亿参数,Scout支持1000万token上下文窗口,Maverick擅长图像理解和编码任务。但Llama 4模型的实际表现令人失望,且授权限制严格,难以在消费级硬件上运行。

🔍RAG(检索增强生成)依然重要。即使拥有大型上下文窗口,RAG在检索信息并为LLM提供上下文方面仍然至关重要。大型上下文窗口并非万无一失,注意力机制存在局限性,LLM难以在长上下文中定位信息。RAG在个性化等场景中仍然有广泛应用。

🚀Google发布专为推理设计的TPU Ironwood,提升了性能和能效,支持大规模并行处理。Google在2010年代开始开发TPU的决策具有前瞻性,Gemini 2.5 Pro的出色表现证明了这一点。

👨‍💻Google发布AI IDE Firebase Studio,增强了基于Web的开发能力。Veo 2现已在Gemini API中提供,允许通过文本到视频和图像到视频功能生成视频。Anthropic仍然为开发者提供5万美元的免费API额度,用于试验Claude。

The hardest part about working in AI and machine learning is keeping up with it all. I send a weekly email to summarize what's most important for software engineers to understand about the developments in AI. Subscribe to join 6500+ engineers getting these in their inbox each week.

If you want to learn machine learning fundamentals, check out my roadmap!

Always be (machine) learning,

Logan

Subscribe now


Llama 4: Meta has disappointed the AI community

The largest news this week was the release of the Llama 4 family of models from Meta. Interestingly, they were released on a Saturday (so almost a week ago) which is unusual as weekend releases tend to make less of a buzz in the AI community.

The Llama 4 “herd” includes 3 models: Scout, Maverick, and Behemoth. Scout and Maverick both have 17 billion active parameters, with Scout utilizing 16 experts and Maverick utilizing 128. Scout supports a 10 million token context window (more on that in the next section) which is 10x Gemini’s already incredibly large context window. Maverick excels in image understanding and coding tasks. Behemoth, which will be the largest model of the Llama 4 herd, is currently training so it has not yet been released.

Benchmarks boast the Llama models beating all competitors except Gemini 2.5 Pro (which is absolutely cracked), but real-world testing is telling a different story. The biggest takeaway from the Llama 4 models so far is that they’re a disappointment. There are many theories of benchmark manipulation to make the models look better than they actually are. Meta says these theories are simply not true.

Two more disappointments: 1) The license for the Llama 4 models are highly restrictive and 2) The models have been criticized for not being able to be run on consumer hardware. Sure, it can be run on multiple 4090s, but most consumers don’t have those sitting around. One of the things that made previous Llama models beloved by the AI community was the ability to easily run them on consumer hardware and a freedom for what developers can do with them.

The takeaway from this release is a lack of clear narrative from Meta. They release seems to have hurt their competitiveness in AI. It’s almost as if they rushed out the models due to recent releases from their competitors.

RAG is not dead

Because Scout has a 10 million token context window the “RAG is dead” narrative is alive again on social media. I want to repeat that this isn’t the case. If you work as a machine learning engineer, it’s likely you’ll need to use RAG at some point and it’s important for you to understand that RAG is not dead. Anyone saying this misunderstands the purpose of it.

Fundamentally, RAG is a way to retrieve information that is needed to provide context to an LLM. The way this is done is by retrieving the information (wherever it may be stored) and appending it to the LLMs context window. With small context windows, a large part of this process is efficiently fitting that information (which could be massive) into the size of the window.

However, large context windows don’t make RAG any less important. First, the retrieval part of RAG will always be relevant. Retrieving information and appending it to a request will always (barring crazy advancements) be a method of providing LLMs information.

Second, RAG will still be used with long context windows. Long context windows are not infallible. Attention mechanisms have their limitations. Studies show that LLMs have a hard time understanding where within a long context window information should be found. This results in LLMs struggling to retrieve information in the middle of a given context. In this case, efficiently providing context to the LLM (such as RAG does) is still important.

Also, there are many use cases (such as personalization) where we have more information to give to an LLM that can be packed into a one or ten million token context window.

Google releases an inference-only TPU

Ironwood is Google's latest TPU, designed specifically for inference, providing significant advancements in performance and energy enabling massive parallel processing capabilities essential for large AI models.

I’m including this for two reasons: 1) It’s really cool and 2) Not relying on someone else to provide the hardware for their AI has paid dividends for Google. In fact, the decision to start developing in-house TPUs in the 2010s might just be the most forward-looking business decision of that decade.

We’ve seen the dividends of Google’s TPU development recently with Gemini 2.5 Pro smashing benchmarks and leaving the competition behind. If you follow me on other social platforms, you’d know I’ve been posting about it a lot. I’m really hoping OpenAI comes out with something comparable soon to continue pushing the development at both companies.

More for developers

Here a few more things that are of interest to software developers:

That's it for discussion topics this week! Thanks for reading. If you missed last week's ML for SWEs update, check it out:

Below are my picks for articles and videos you don't want to miss from this past week. My entire reading list is included for paid subs. Thank you for your support! 😊

If you want to support Society's Backend, you can do so for just $3/mo.

Get 40% off forever

My picks

A Prompt Engineering Guide from Google: Prompt engineering is the process of crafting effective input prompts for large language models to ensure accurate and meaningful output, considering factors like model type, training data, and word choice.

How to Stay Sharp on AI Without Burning Out by : To stay updated on AI without burning out, it's essential to adopt an efficient learning system that allows for quick comprehension and application of complex topics while balancing professional and personal commitments.

Beyond vibe checks: A PM’s complete guide to evals: Mastering evaluations (evals) is crucial for AI product managers, as they provide the insights needed to ensure AI systems perform effectively and meet user expectations.

How I Became a Machine Learning Engineer Without an Advanced Degree by : Logan Thorneloe shares his six-year journey to becoming a machine learning engineer at Google without an advanced degree, emphasizing the importance of strategic planning, networking, and seizing opportunities in education and internships.

Nonproliferation is the wrong approach to AI misuse by : Focusing on leveraging adaptation buffers to implement defensive measures is a more effective strategy for managing the risks of catastrophic AI misuse than relying on nonproliferation.

There Are No New Ideas in AI… Only New Datasets: AI progress primarily stems from leveraging new datasets rather than groundbreaking ideas, suggesting that future advancements will likely depend on unlocking and utilizing previously untapped sources of data.

Read more

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Llama 4 RAG TPU AI开发
相关文章