本期的 16 篇论文如下:
[00:24] 🤔 Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training(Agent-R:通过迭代自训练使语言模型代理具备反思能力)
[00:59] 🎥 MMVU: Measuring Expert-Level Multi-Discipline Video Understanding(MMVU:专家级多学科视频理解的测量)
[01:35] ⚖ Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models(细节中的魔鬼:实现负载均衡损失以训练专业化专家混合模型)
[02:17] 🤖 UI-TARS: Pioneering Automated GUI Interaction with Native Agents(UI-TARS:开创性的原生GUI交互自动化代理)
[02:55] 🤖 Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks(Mobile-Agent-E:面向复杂任务的自我进化移动助手)
[03:31] 🎨 TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space(TokenVerse:基于令牌调制空间的多概念个性化方法)
[04:14] 🏆 InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model(InternLM-XComposer2.5-Reward:一种简单而有效的多模态奖励模型)
[04:57] 🎥 Video Depth Anything: Consistent Depth Estimation for Super-Long Videos(视频深度任意:超长视频的一致性深度估计)
[05:39] 🤖 Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments(通过交互学习:现实环境中自适应代理的数据中心框架)
[06:18] 🧠 Reasoning Language Models: A Blueprint(推理语言模型:蓝图)
[06:58] 🎨 Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation(Hunyuan3D 2.0:扩展扩散模型以生成高分辨率纹理3D资产)
[07:40] 🧠 Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement(Condor:通过知识驱动的数据合成与精炼增强大语言模型的对齐能力)
[08:21] 🎥 EMO2: End-Effector Guided Audio-Driven Avatar Video Generation(EMO2:基于末端执行器引导的音频驱动虚拟形象视频生成)
[08:55] 🎥 Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise(随流而动:使用实时扭曲噪声实现运动可控的视频扩散模型)
[09:32] 🌍 GPS as a Control Signal for Image Generation(GPS作为图像生成的控制信号)
[10:11] ⚠ MSTS: A Multimodal Safety Test Suite for Vision-Language Models(MSTS:面向视觉-语言模型的多模态安全测试套件)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递