Exploring State-Space-Model based Language Model in Music Generation

cs.AI updates on arXiv.org 07月10日 12:05

Exploring State-Space-Model based Language Model in Music Generation

本文探讨了基于Mamba架构的文本转音乐生成潜力，采用RVQ离散标记建模，通过SiMBA解码器实现高效生成，结果表明在资源受限条件下，SiMBA解码器性能优于标准Transformer解码器。

arXiv:2507.06674v1 Announce Type: cross Abstract: The recent surge in State Space Models (SSMs), particularly the emergence of Mamba, has established them as strong alternatives or complementary modules to Transformers across diverse domains. In this work, we aim to explore the potential of Mamba-based architectures for text-to-music generation. We adopt discrete tokens of Residual Vector Quantization (RVQ) as the modeling representation and empirically find that a single-layer codebook can capture semantic information in music. Motivated by this observation, we focus on modeling a single-codebook representation and adapt SiMBA, originally designed as a Mamba-based encoder, to function as a decoder for sequence modeling. We compare its performance against a standard Transformer-based decoder. Our results suggest that, under limited-resource settings, SiMBA achieves much faster convergence and generates outputs closer to the ground truth. This demonstrates the promise of SSMs for efficient and expressive text-to-music generation. We put audio examples on Github.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Mamba架构文本转音乐生成 SiMBA解码器 RVQ建模资源受限

相关文章

Mamba真比Transformer更优吗？Mamba原作者：两个都要！混合架构才是最优解

Mamba写代码真的超越Transformer，原始论文入选顶流新会议

Falcon Mamba 7B 开源模型登顶：换掉 Transformer，任意长序列都能处理

换掉Transformer，7B开源模型立刻登顶，任意长序列都能处理

多亏Transformer，Mamba更强了，仅用1%计算量达新SOTA

多亏Transformer，Mamba更强了！仅用1%计算量达新SOTA

Meta祭出三篇最详尽Llama微调指南！千字长文，0基础小白必备

TinyTNAS: A Groundbreaking Hardware-Aware NAS Tool for TinyML Time Series Classification

LightLLM: A Lightweight, Scalable, and High-Speed Python Framework for LLM Inference and Serving

Testing which LLM architectures can do hidden serial reasoning