本期的 15 篇论文如下:
00:22 🌍 Geopolitical biases in LLMs: what are the "good" and the "bad" countries according to contemporary language models(LLM中的地缘政治偏见:在当代语言模型中,哪些是“好”国家,哪些是“坏”国家?)
01:09 🤖 RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling(RuleReasoner:基于领域感知动态采样的强化规则推理)
01:48 🖼 Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better(自回归语义视觉重建助力视觉-语言模型更好地理解)
02:30 🎬 Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion(自激:弥合自回归视频扩散中的训练-测试差距)
03:08 🧮 Solving Inequality Proofs with Large Language Models(利用大型语言模型求解不等式证明)
03:49 🤖 Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation(三思而后行:用于GUI自动化中术前错误诊断的GUI-Critic-R1模型)
04:25 🖼 Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models(帧引导:视频扩散模型中用于帧级别控制的免训练引导)
05:05 🤖 Aligning Text, Images, and 3D Structure Token-by-Token(逐Token对齐文本、图像与3D结构)
05:51 🔍 ECoRAG: Evidentiality-guided Compression for Long Context RAG(ECoRAG:证据性引导的长文本RAG压缩)
06:28 🎬 DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval(DiscoVLA:面向参数高效视频-文本检索的视觉、语言和对齐差异缩减)
07:14 🖼 Interpretable and Reliable Detection of AI-Generated Images via Grounded Reasoning in MLLMs(基于多模态大语言模型中具身推理的可解释、可靠的AI生成图像检测)
08:06 🗜 Squeeze3D: Your 3D Generation Model is Secretly an Extreme Neural Compressor(Squeeze3D:你的3D生成模型实际上是一个极致的神经压缩器)
08:46 🤖 Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction(思考与行动:通过扩展测试时交互进行推理的智能体)
09:21 🧩 MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models(MoA:用于大语言模型参数高效微调的异构适配器混合)
09:58 📚 Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability(机构书籍1.0:来自哈佛图书馆馆藏的2420亿token数据集,经过精确化处理,具有更高的准确性和可用性)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递