2025.06.25 | AnimaX提升3D非生物体动画效果；Matrix-Game优化游戏世界模型。

本期的 15 篇论文如下：

00:25 🤖 AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models（AnimaX：利用联合视频-姿态扩散模型为3D非生物体赋予动画效果）

01:11 🎮 Matrix-Game: Interactive World Foundation Model（矩阵游戏：交互式世界基础模型）

01:50 🧠 GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning（GRPO-CARE：一致性感知的多模态推理强化学习）

02:33 💡 Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs（Skywork-SWE：揭示LLM在软件工程领域的数据扩展法则）

03:18 🖼 ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing（ScaleCap：通过双模态去偏实现推理时可扩展的图像描述）

03:58 🤔 Can Large Language Models Capture Human Annotator Disagreements?（大型语言模型能否捕捉人类标注者的分歧？）

04:49 🛠 SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications（SWE-SQL：揭示大型语言模型在解决真实应用中用户SQL问题上的途径）

05:37 🎨 JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent（JarvisArt：通过智能照片修饰代理释放人类艺术创造力）

06:21 🧠 SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning（SRFT：一种用于推理的监督和强化微调的单阶段方法）

07:04 🎬 SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution（SimpleGVR：一种用于潜在级联视频超分辨率的简单基线）

07:41 🖼 Guidance in the Frequency Domain Enables High-Fidelity Sampling at Low CFG Scales（频域指导助力低CFG规模下的高保真采样）

08:22 🤖 Unified Vision-Language-Action Model（统一的视觉-语言-动作模型）

08:59 🤔 Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study（为什么开源LLM在数据分析中表现不佳？一项系统的实证研究）

09:33 🗣 Lost in the Mix: Evaluating LLM Understanding of Code-Switched Text（迷失在混合中：评估大型语言模型对语码转换文本的理解）

10:08 🔊 USAD: Universal Speech and Audio Representation via Distillation（USAD：通过知识蒸馏实现的通用语音和音频表征）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

Fish AI Reader