2025.06.03 | 高熵Token提升LLM推理；推理健身房优化强化学习环境。

本期的 15 篇论文如下：

00:22 🧠 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning（超越80/20法则：高熵少数Token驱动LLM推理的有效强化学习）

01:05 🧠 REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards（推理健身房：基于可验证奖励的强化学习推理环境）

01:46 🤖 SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics（SmolVLA：一种用于经济高效型机器人的视觉-语言-动作模型）

02:31 🚀 Taming LLMs by Scaling Learning Rates with Gradient Grouping（通过梯度分组调整学习率以驯服大型语言模型）

03:19 🧩 Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles（拼图-R1：基于规则的视觉强化学习与拼图游戏研究）

04:06 🎬 Temporal In-Context Fine-Tuning for Versatile Control of Video Diffusion Models（用于视频扩散模型多功能控制的时序上下文微调）

04:43 🤖 ARIA: Training Language Agents with Intention-Driven Reward Aggregation（ARIA：基于意图驱动的奖励聚合训练语言智能体）

05:27 🤖 LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks（LoHoVLA：用于长时程具身任务的统一视觉-语言-动作模型）

06:02 🤖 ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding（ShapeLLM-Omni：用于3D生成与理解的原生多模态LLM）

06:41 🤖 Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control（基于协作轨迹控制的机器人操作视频生成学习）

07:15 🚀 AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning（AReaL：用于语言推理的大规模异步强化学习系统）

07:56 🌍 EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models（地球之 Mind：面向多粒度和多传感器地球观测的大型多模态模型）

08:35 🤔 SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning（SRPO：通过反思感知强化学习增强多模态LLM的推理能力）

09:14 🤖 MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning（MiCRo：用于个性化偏好学习的混合建模和上下文感知路由）

09:48 🤖 Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models（激励推理以提升大型语言模型的高级指令跟随能力）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签