本期的 29 篇论文如下:
[00:23] ⚡ Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention(原生稀疏注意力:硬件对齐与原生可训练的稀疏注意力)
[01:10] ? Learning Getting-Up Policies for Real-World Humanoid Robots(学习真实世界人形机器人起身策略)
[01:55] ? ReLearn: Unlearning via Learning for Large Language Models(ReLearn:通过学习实现大型语言模型的遗忘)
[02:35] ? SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?(SWE-Lancer:前沿大语言模型能否从真实世界的自由软件工程中赚取100万美元?)
[03:21] ? HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation(赫尔墨斯流:无缝衔接多模态理解和生成)
[03:58] ? How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training(大型语言模型如何获取新知识?知识电路视角下的持续预训练)
[04:33] ? SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors(SURGE:关于大型语言模型作为通用代理代码执行器的潜力)
[05:12] ? Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening(扩散锐化:利用去噪轨迹锐化优化扩散模型微调)
[05:55] ? I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models(我思故我扩散:在扩散模型中实现多模态上下文推理)
[06:38] ? SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL(SAFE-SQL:基于细粒度示例选择的自增强上下文学习用于文本到SQL转换)
[07:25] ? CRANE: Reasoning with constrained LLM generation(CRANE:受限LLM生成的推理)
[08:07] ? Intuitive physics understanding emerges from self-supervised pretraining on natural videos(直觉物理理解从自然视频的自监督预训练中涌现)
[08:46] ? Cuckoo: An IE Free Rider Hatched by Massive Nutrition in LLM's Nest(杜鹃:在大型语言模型的巢中孵化出的信息抽取搭便车者)
[09:22] ? Dyve: Thinking Fast and Slow for Dynamic Process Verification(Dyve:动态过程验证中的快思与慢想)
[10:06] ? PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning(物理推理:基于物理推理的综合基准)
[10:53] ? System Message Generation for User Preferences using Open-Source Models(基于开源模型的用户偏好系统消息生成)
[11:38] ? video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model(视频-SALMONN-o1:推理增强的音视频大型语言模型)
[12:33] ? Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarsity(构建一个在数据稀缺情况下比GPT-4o好64%的证明导向程序员)
[13:11] ? Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning(记忆、基准与机器人:一种用于强化学习解决复杂任务的基准)
[13:52] ? MagicArticulate: Make Your 3D Models Articulation-Ready(魔法清晰:让你的3D模型准备好关节动画)
[14:37] ? Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems(结构化交流,层次化行动:LLM多智能体系统的协作框架)
[15:21] ? One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs(一个示例展示,多个概念知晓!数学大语言模型中的反例驱动概念推理)
[16:03] ? Can a Single Model Master Both Multi-turn Conversations and Tool Use? CALM: A Unified Conversational Agentic Language Model(单一模型能否同时掌握多轮对话与工具使用?CALM:一个统一的对话代理语言模型)
[16:40] ? Better Embeddings with Coupled Adam(结合Adam优化器的更好嵌入)
[17:18] ? Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking(展示工作:事实核查员对可解释自动化事实核查的需求)
[17:56] ? Towards Data-Efficient Pretraining for Atomic Property Prediction(面向原子性质预测的数据高效预训练)
[18:46] ? The Mirage of Model Editing: Revisiting Evaluation in the Wild(模型编辑的幻象:重新审视实际应用中的评估)
[19:31] ? Large Language Models and Mathematical Reasoning Failures(大型语言模型与数学推理失败)
[20:11] ? Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance(语言复杂度测量作为评估LLM性能的噪声零样本代理)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递