Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine

cs.AI updates on arXiv.org 07月18日 12:13

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine

本文介绍了一种高效的音频编码方法ACoM，通过任务特定损失指导和残差向量量化，实现了超低比特率（小于200 bps）的压缩，同时对下游模型性能影响最小，并具有灵活部署的能力。

arXiv:2507.12701v1 Announce Type: cross Abstract: Neural audio codecs, leveraging quantization algorithms, have significantly impacted various speech/audio tasks. While high-fidelity reconstruction is paramount for human perception, audio coding for machines (ACoM) prioritizes efficient compression and downstream task performance, disregarding perceptual nuances. This work introduces an efficient ACoM method that can compress and quantize any chosen intermediate feature representation of an already trained speech/audio downstream model. Our approach employs task-specific loss guidance alongside residual vector quantization (RVQ) losses, providing ultra-low bitrates (i.e., less than 200 bps) with a minimal loss of the downstream model performance. The resulting tokenizer is adaptable to various bitrates and model sizes for flexible deployment. Evaluated on automatic speech recognition and audio classification, our method demonstrates its efficacy and potential for broader task and architectural applicability through appropriate regularization.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

音频编码 ACoM 残差向量量化比特率性能

相关文章

Microsoft’s New Category of Windows PCs designed for AI, Copilot+ PCs

Arm unveils new AI designs and software for smartphones

MAP-Neo: A Fully Open-Source and Transparent Bilingual LLM Suite that Achieves Superior Performance to Close the Gap with Closed-Source Models

LLM360 Introduces K2: A Fully-Reproducible Open-Sourced Large Language Model Efficiently Surpassing Llama 2 70B with 35% Less Computational Power

Using Primary, Partition, and Clustering Keys in ScyllaDB (or Cassandra)

联创股份：已完成第四代制冷剂产业化，具备万吨级产能

英特尔详细介绍新款 Lunar Lake CPU，抵制 AMD、高通和苹果

英伟达超前发布下一代人工智能芯片 "Rubin

Linux 6.7 中的更多 Bcachefs 修复程序

KDE 桌面是否真的如宣称的那样比 XFCE 更快？