本期的 15 篇论文如下:
00:24 🤖 ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development(ComfyUI-Copilot:用于自动化工作流开发的智能助手)
00:59 🎬 SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training(SeedVR2:基于扩散对抗后训练的单步视频修复)
01:39 🤖 RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics(RoboRefer:面向机器人视觉-语言模型中基于推理的空间指代)
02:26 🚄 Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts(对角批量处理解锁循环记忆Transformer在长文本中的并行性)
03:08 🧠 Video World Models with Long-term Spatial Memory(基于长期空间记忆的视频世界模型)
03:46 🌐 Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights(Surfer-H:基于开放权重的低成本高效能Web代理)
04:32 ⚛ VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models(VideoREPA:通过与基础模型的关系对齐学习物理知识以用于视频生成)
05:17 📚 Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models(Qwen3 Embedding:通过基础模型推进文本嵌入和重排序)
05:55 🔢 AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs(AV-Reasoner:提升多模态大型语言模型线索引导的音视频计数能力及构建基准)
06:38 🌌 Aligning Latent Spaces with Flow Priors(利用流动先验对齐隐空间)
07:22 📚 The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text(Common Pile v0.1:一个包含公共领域和开放许可文本的8TB数据集)
08:15 🧠 Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations(展开空间认知:评估视觉模拟上的多模态模型)
09:06 🧠 StreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs(StreamBP:LLM长序列训练的内存高效精确反向传播)
09:48 🚀 Inference-Time Hyper-Scaling with KV Cache Compression(基于KV缓存压缩的推理时超 масштабирование)
10:30 👁 SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs(SparseMM:多模态大型语言模型中视觉概念响应涌现的 Head 稀疏性)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递