本期的 15 篇论文如下:
00:22 🧠 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning(超越80/20法则:高熵少数Token驱动LLM推理的有效强化学习)
01:05 🧠 REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards(推理健身房:基于可验证奖励的强化学习推理环境)
01:46 🤖 SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics(SmolVLA:一种用于经济高效型机器人的视觉-语言-动作模型)
02:31 🚀 Taming LLMs by Scaling Learning Rates with Gradient Grouping(通过梯度分组调整学习率以驯服大型语言模型)
03:19 🧩 Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles(拼图-R1:基于规则的视觉强化学习与拼图游戏研究)
04:06 🎬 Temporal In-Context Fine-Tuning for Versatile Control of Video Diffusion Models(用于视频扩散模型多功能控制的时序上下文微调)
04:43 🤖 ARIA: Training Language Agents with Intention-Driven Reward Aggregation(ARIA:基于意图驱动的奖励聚合训练语言智能体)
05:27 🤖 LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks(LoHoVLA:用于长时程具身任务的统一视觉-语言-动作模型)
06:02 🤖 ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding(ShapeLLM-Omni:用于3D生成与理解的原生多模态LLM)
06:41 🤖 Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control(基于协作轨迹控制的机器人操作视频生成学习)
07:15 🚀 AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning(AReaL:用于语言推理的大规模异步强化学习系统)
07:56 🌍 EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models(地球之 Mind:面向多粒度和多传感器地球观测的大型多模态模型)
08:35 🤔 SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning(SRPO:通过反思感知强化学习增强多模态LLM的推理能力)
09:14 🤖 MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning(MiCRo:用于个性化偏好学习的混合建模和上下文感知路由)
09:48 🤖 Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models(激励推理以提升大型语言模型的高级指令跟随能力)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递