本期的 11 篇论文如下:
[00:22] 🎥 Towards Understanding Camera Motions in Any Video(迈向理解任意视频中的相机运动)
[01:04] 🧠 Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning(Skywork R1V2:用于推理的多模态混合强化学习)
[01:49] 💡 BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs(BitNet v2:用于1-bit LLM的具有哈达玛变换的原生4-bit激活)
[02:28] 🌍 VideoVista-CulturalLingo: 360$^\circ$ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension(VideoVista-CulturalLingo:360°视野——弥合视频理解中的文化、语言和领域差异)
[03:13] 🗣 Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark(大型语言模型能否助力多模态语言分析?MMLA:一个综合性的基准)
[03:48] 🤔 The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs(稀疏前沿:Transformer LLM 中的稀疏注意力权衡)
[04:23] 🎬 Subject-driven Video Generation via Disentangled Identity and Motion(基于解耦身份与运动的主体驱动视频生成)
[05:00] 🧠 DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models(DianJin-R1:评估并提升大型语言模型中的金融推理能力)
[05:34] 🔲 DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency(DC-SAM:通过双重一致性实现图像和视频中的上下文分割)
[06:12] 🔊 Kimi-Audio Technical Report(Kimi-Audio技术报告)
[06:43] 🇮 Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation(优化意大利语大型语言模型:通过词汇调整减少Token冗余并提高效率)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递