Enhancing Dialogue Annotation with Speaker Characteristics Leveraging a Frozen LLM

cs.AI updates on arXiv.org 6小时前

Enhancing Dialogue Annotation with Speaker Characteristics Leveraging a Frozen LLM

文章探讨了在对话转录中通过添加元数据标签（如年龄、性别、情绪）来丰富转录对话，并采用预训练模型进行性能提升，同时保持了模块化和速度。

arXiv:2508.04795v1 Announce Type: cross Abstract: In dialogue transcription pipelines, Large Language Models (LLMs) are frequently employed in post-processing to improve grammar, punctuation, and readability. We explore a complementary post-processing step: enriching transcribed dialogues by adding metadata tags for speaker characteristics such as age, gender, and emotion. Some of the tags are global to the entire dialogue, while some are time-variant. Our approach couples frozen audio foundation models, such as Whisper or WavLM, with a frozen LLAMA language model to infer these speaker attributes, without requiring task-specific fine-tuning of either model. Using lightweight, efficient connectors to bridge audio and language representations, we achieve competitive performance on speaker profiling tasks while preserving modularity and speed. Additionally, we demonstrate that a frozen LLAMA model can compare x-vectors directly, achieving an Equal Error Rate of 8.8% in some scenarios.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

对话转录元数据标签预训练模型性能提升模型应用

相关文章

Show HN: 开源 LLM 补丁流 - 速度和输出令牌改进

Rivian 更新 R1，采用新型电机和电池组，提高了性能，降低了成本

Solana: ↩️ @vohvohh

SecWiki News 2024-06-02 Review

Intel：正式发布第二代酷睿Ultra处理器架构

重要科學運算函式庫NumPy經多年開發迎來2.0重大更新

号称提升100倍的CPU设计，真相究竟是什么

苹果 iOS 18 助力 iPhone 15 Pro Max 机器学习测试得分提高 25%

Salesforce AI Unveils SFR-Embedding-v2: Reclaiming Top Spot on HuggingFace MTEB Benchmark with Advanced Multitasking and Enhanced Performance in AI

零下78℃全网首发！“骁龙8Gen2”极限超频49%！能干翻8Gen3？甚至比肩M1吗？【小鹏HiTech】