ThursdAI - Recaps of the most high signal AI weekly spaces 2024年10月22日
? ThursdAI - July 11 - Mixture of Agents & Open Router interviews (no news this week)
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本期播客主要介绍了AI领域的一些最新动态,包括混合代理模型在AlpacaEval基准测试中的应用,OpenRouter平台的最新进展,以及Weave团队推出的评估系统。混合代理模型利用多个较小模型的协作来超越更大的模型,OpenRouter平台提供了一个统一的接口,方便用户使用各种基础模型和开源模型,而Weave评估系统则提供了一个功能强大的工具,方便用户对各种AI模型进行评估和比较。此外,播客还推荐了Tanishq Abraham和Aran Komatsuzaki共同发起的AI论文周报,该周报旨在帮助用户了解最新的AI论文。

😊 混合代理模型:本期播客首先介绍了Together AI和OpenPipe分别提出的混合代理模型,该模型通过将多个较小的模型进行协作,在AlpacaEval基准测试中取得了显著的成绩。Together AI的方法是将多个模型的输出进行融合,而OpenPipe则采用了不同的方法,将多个模型的输出进行组合。

😎 OpenRouter平台:OpenRouter平台提供了一个统一的接口,方便用户使用各种基础模型和开源模型,包括GPT、Claude、Gemini等。该平台还支持OpenAI SDK格式,方便用户进行模型评估和比较。此外,OpenRouter还提供了一些免费的模型,例如Phi,方便用户进行尝试。

🤩 Weave评估系统:Weave团队推出的评估系统是一个功能强大的工具,方便用户对各种AI模型进行评估和比较。该系统提供了丰富的功能,包括模型性能指标、模型比较、模型可视化等。

🥳 AI论文周报:Tanishq Abraham和Aran Komatsuzaki共同发起的AI论文周报,旨在帮助用户了解最新的AI论文。该周报每周都会发布一篇最新的AI论文,并进行详细的解读。

🏖️ 播客作者将在下周休假,并将在两周后恢复更新。

Hey all, Alex here… well, not actually here, I’m scheduling this post in advance, which I haven’t yet done, because I'm going on vacation!

That’s right, next week is my birthday ? and a much needed break, somewhere with a beach is awaiting, but I didn’t want to leave you hanging for too long, so posting this episode with some amazing un-released before material.

Mixture of Agents x2

Back in the far away days of June 20th (not that long ago but feels like ages!), Together AI announced a new paper, released code and posted a long post about a new method to collaboration between smaller models to beat larger models. They called it Mixture of Agents, and James Zou joined us to chat about that effort.

Shortly after that - in fact, during the live ThursdAI show, Kyle Corbitt announced that OpenPipe also researched an approached similar to the above, using different models and a bit of a different reasoning, and also went after the coveted AlpacaEval benchmark, and achieved SOTA score of 68.8 using this method.

And I was delighted to invite both James and Kyle to chat about their respective approach the same week that both broke AlpacaEval SOTA and hear how utilizing collaboration between LLMs can significantly improve their outputs!

This weeks buzz - what I learned at W&B this week

So much buzz this week from the Weave team, it’s hard to know what to put in here. I can start with the incredible integrations my team landed, Mistral AI, LLamaIndex, DSPy, OpenRouter and even Local Models served by Ollama, LmStudio, LLamaFile can be now auto tracked with Weave, which means you literally have to only instantiate Weave and it’ll auto track everything for you!

But I think the biggest, hugest news from this week is this great eval comparison system that the Weave Tim just pushed, it’s honestly so feature rich that I’ll have to do a deeper dive on it later, but I wanted to make sure I include at least a few screencaps because I think it looks fantastic!

Open Router - A unified interface for LLMs

I’ve been a long time fan of OpenRouter.ai and I was very happy to have Alex Atallah on the show to talk about Open Router (even if this did happen back in April!) and I’m finally satisfied with the sound quality to released this conversation.

Open Router is serving both foundational models like GPT, Claude, Gemini and also Open Source ones, and supports the OpenAI SDK format, making it super simple to play around and evaluate all of them on the same code. They even provide a few models for free! Right now you can use Phi for example completely free via their API.

Alex goes deep into the areas of Open Router that I honestly didn’t really know about, like being a marketplace, knowing what trendy LLMs are being used by people in near real time (check out WebSim!) and more very interesting things!

Give that conversation a listen, I’m sure you’ll enjoy it!


That’s it folks, no news this week, I would instead like to recommend a new newsletter by friends of the pod Tanishq Abraham and Aran Komatsuzaki both of whom are doing a weekly paper X space and recently start posting it on Substack as well!

It’s called AI papers of the week, and if you’re into papers which we don’t usually cover, there’s no better duo! In fact, Tanishq often used to come to ThursdAI to explain papers so you may recognize his voice :)

See you all in two weeks after I get some seriously needed R&R ? ??️

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

混合代理模型 OpenRouter Weave AI论文周报 AI技术
相关文章