本期的 18 篇论文如下:
[00:23] ? Kanana: Compute-efficient Bilingual Language Models(Kanana:计算高效的双语语言模型)
[00:54] ? GHOST 2.0: generative high-fidelity one shot transfer of heads(GHOST 2.0:生成高保真一次性头部转移)
[01:43] ? TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding(定理解释代理:面向大语言模型定理理解的多模态解释)
[02:21] ? Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems(代理奖励建模:将人类偏好与可验证的正确性信号结合以构建可靠的奖励系统)
[03:02] ? Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?(大型语言模型能否检测长链推理中的错误?)
[03:47] ? Language Models' Factuality Depends on the Language of Inquiry(语言模型的事实性依赖于查询语言)
[04:27] ? Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation(语言模型能否证伪?评估算法推理中的反例创建)
[05:11] ? Towards an AI co-scientist(迈向人工智能合作科学家)
[05:52] ? Plutus: Benchmarking Large Language Models in Low-Resource Greek Finance(普鲁托斯:在低资源希腊金融环境中评估大型语言模型)
[06:38] ? VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model(VEM:利用价值环境模型训练GUI代理的无环境探索)
[07:12] ? Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator(蒸馏任意深度:蒸馏技术创造更强的单目深度估计器)
[07:52] ? Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs(亚历山大项目:通过大型语言模型解除科学知识的版权负担)
[08:35] ? AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement(AISafetyLab:AI安全评估与改进的综合框架)
[09:23] ? BIG-Bench Extra Hard(BIG-Bench 超难版本)
[10:07] ? CritiQ: Mining Data Quality Criteria from Human Preferences(CritiQ:从人类偏好中挖掘数据质量标准)
[10:44] ? MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra(MolSpectra:利用多模态能量光谱预训练三维分子表示)
[11:28] ? PosterSum: A Multimodal Benchmark for Scientific Poster Summarization(PosterSum:科学海报摘要的多模态基准)
[12:08] ? DOEI: Dual Optimization of Embedding Information for Attention-Enhanced Class Activation Maps(双优化嵌入信息用于增强注意力类激活图)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递