本期的 15 篇论文如下:
00:27 🧮 A Survey of Context Engineering for Large Language Models(大型语言模型上下文工程综述)
01:16 🧠 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning(VisionThink:基于强化学习的智能高效视觉语言模型)
02:08 📸 $π^3$: Scalable Permutation-Equivariant Visual Geometry Learning($\pi^3$:可扩展的置换等变视觉几何学习)
02:52 🤖 The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner(模仿游戏:图灵机模仿器是长度泛化的推理器)
03:47 🖼 AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning(AnyCap项目:一个用于可控全模态图像描述的统一框架、数据集和基准)
04:47 🧑 Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models(Diffuman4D:基于时空扩散模型的稀疏视角视频的4D一致性人体视角合成)
05:34 🎭 FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers(梦幻肖像:利用表情增强的扩散Transformer提升多角色肖像动画效果)
06:23 🧠 MindJourney: Test-Time Scaling with World Models for Spatial Reasoning(心灵之旅:基于世界模型的测试时空域推理扩展)
07:17 🔬 AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research(AbGen:评估大型语言模型在科学研究的消融实验设计与评估中的能力)
08:08 🗣 Voxtral(Voxtral:多模态音频聊天模型)
08:55 💡 Teach Old SAEs New Domain Tricks with Boosting(利用Boosting技术使旧的稀疏自编码器掌握新的领域技巧)
09:46 💡 FLEXITOKENS: Flexible Tokenization for Evolving Language Models(FLEXITOKENS:用于演化语言模型的灵活分词)
10:49 🎬 TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation(TLB-VFI:用于视频帧插值的时序感知潜在布朗桥扩散模型)
11:45 🛡 Automating Steering for Safe Multimodal Large Language Models(多模态大语言模型安全自动导向)
12:25 ⚙ RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization(RiemannLoRA:一种用于无歧义LoRA优化的统一黎曼框架)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递