本期的 12 篇论文如下:
00:23 💡 GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning(GLM-4.1V-Thinking:基于可扩展强化学习的通用多模态推理)
01:00 🖼 MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings(MoCa:模态感知持续预训练提升双向多模态嵌入效果)
01:35 🔬 SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks(SciArena:科学文献任务中基础模型的开放评估平台)
02:19 🤔 Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning(数学推理能力是否能提升通用大语言模型的能力?理解大语言模型推理的迁移性)
02:59 🎬 Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation(径向注意力:用于长视频生成的具有能量衰减的O(n log n)稀疏注意力机制)
03:37 🤖 DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation(DiffuCoder:理解并改进用于代码生成的掩码扩散模型)
04:19 🧠 HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context(HumanOmniV2:基于上下文理解到全模态推理)
04:53 🧠 Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact(超越Token:从脑启发智能到通用人工智能的认知基础及其社会影响)
05:30 💡 Data Efficacy for Language Model Training(语言模型训练中的数据效能)
06:05 🎬 FreeLong++: Training-Free Long Video Generation via Multi-band SpectralFusion(FreeLong++:通过多频段频谱融合实现免训练长视频生成)
06:40 🖼 IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering(IR3D-Bench:评估视觉-语言模型作为智能体进行逆向渲染的场景理解能力)
07:28 🛡 Peccavi: Visual Paraphrase Attack Safe and Distortion Free Image Watermarking Technique for AI-Generated Images(Peccavi:一种针对AI生成图像的视觉释义攻击安全且无失真的图像水印技术)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递