Video Soundtrack Generation by Aligning Emotions and Temporal Boundaries

cs.AI updates on arXiv.org 6小时前

Video Soundtrack Generation by Aligning Emotions and Temporal Boundaries

介绍一种名为EMSYNC的视频情感同步音乐生成模型，该模型通过情感分类器和条件音乐生成器，实现与视频情感内容和时间边界对齐的音乐生成，在主观听感测试中表现优于现有模型。

arXiv:2502.10154v2 Announce Type: replace-cross Abstract: We introduce EMSYNC, a video-based symbolic music generation model that aligns music with a video's emotional content and temporal boundaries. It follows a two-stage framework, where a pretrained video emotion classifier extracts emotional features, and a conditional music generator produces MIDI sequences guided by both emotional and temporal cues. We introduce boundary offsets, a novel temporal conditioning mechanism that enables the model to anticipate and align musical chords with scene cuts. Unlike existing models, our approach retains event-based encoding, ensuring fine-grained timing control and expressive musical nuances. We also propose a mapping scheme to bridge the video emotion classifier, which produces discrete emotion categories, with the emotion-conditioned MIDI generator, which operates on continuous-valued valence-arousal inputs. In subjective listening tests, EMSYNC outperforms state-of-the-art models across all subjective metrics, for music theory-aware participants as well as the general listeners.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

音乐生成视频情感音乐理论 EMSYNC模型音乐时间同步

相关文章

Comment on Import AI 316: Scaling laws for RL; Stable Diffusion for $160k; YOLOv8. by Import AI 332: Mini-AI; safety through evals; Facebook releases a RLHF dataset | Import AI

New generative media models and tools, built with and for creators

AmbientGPT: An Open-Source and Multimodal MacOS Foundation Model GUI

Show HN: 一个能用人工智能生成说唱歌词和歌曲的网站

Show HN: Suno AI 音乐 - 发现/下载/生成 Suno AI 音乐

Instruct-MusicGen: A Novel Artificial Intelligence AI Approach to Text-to-Music Editing that Fosters Joint Musical and Textual Controls

产品安利社 06月13日

Meta unveils five AI models for multi-modal processing, music generation, and more

Meta FAIR’s Groundbreaking AI Releases: Enhancing Creativity, Efficiency, and Responsibility in Open Science AI Research and Development

RIAA控告Suno與Udio的生成式AI音樂服務侵權