StutterCut: Uncertainty-Guided Normalised Cut for Dysfluency Segmentation

cs.AI updates on arXiv.org 前天 19:10

StutterCut: Uncertainty-Guided Normalised Cut for Dysfluency Segmentation

文章介绍了一种名为StutterCut的半监督框架，通过将语音流畅度分割视为图划分问题，实现语音流畅度检测与分割，并在真实和合成数据集上展现出优越性能。

arXiv:2508.02255v1 Announce Type: cross Abstract: Detecting and segmenting dysfluencies is crucial for effective speech therapy and real-time feedback. However, most methods only classify dysfluencies at the utterance level. We introduce StutterCut, a semi-supervised framework that formulates dysfluency segmentation as a graph partitioning problem, where speech embeddings from overlapping windows are represented as graph nodes. We refine the connections between nodes using a pseudo-oracle classifier trained on weak (utterance-level) labels, with its influence controlled by an uncertainty measure from Monte Carlo dropout. Additionally, we extend the weakly labelled FluencyBank dataset by incorporating frame-level dysfluency boundaries for four dysfluency types. This provides a more realistic benchmark compared to synthetic datasets. Experiments on real and synthetic datasets show that StutterCut outperforms existing methods, achieving higher F1 scores and more precise stuttering onset detection.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

StutterCut 语音流畅度半监督框架图划分语音处理

相关文章

SpeechVerse: A Multimodal AI Framework that Enables LLMs to Follow Natural Language Instructions for Performing Diverse Speech-Processing Tasks

This AI Paper by NVIDIA Introduces NEST: A Fast and Efficient Self-Supervised Model for Speech Processing

SpeechBrain: A PyTorch-based Speech Toolkit

大规模、动态「语音增强/分离」新基准，清华发布移动音源仿真平台SonicSim，含950+小时训练数据

大规模、动态「语音增强/分离」新基准！清华发布移动音源仿真平台SonicSim，含950+小时训练数据

快速创建 3D 数字人头；开源多功能修图神器；Runway 新增高级运镜功能；通义提示词生成连贯图像；音频版 LoRa 音乐创作

Alibaba Speech Lab Releases ClearerVoice-Studio: An Open-Sourced Voice Processing Framework Supporting Speech Enhancement, Separation, and Target Speaker Extraction

This company is using AI to give people American-sounding accents

PhonemeFake: Redefining Deepfake Realism with Language-Driven Segmental Manipulation and Adaptive Bilevel Detection

K-Function: Joint Pronunciation Transcription and Feedback for Evaluating Kids Language Function