cs.AI updates on arXiv.org 15小时前
StylOch at PAN: Gradient-Boosted Trees with Frequency-Based Stylometric Features
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出一种基于模块化语料库的AI文本检测方法,采用spaCy模型进行文本预处理并提取特征,运用轻梯度提升机作为分类器,通过大规模机器生成文本训练,实现非神经网络、低计算成本且可解释的AI文本检测。

arXiv:2507.12064v1 Announce Type: cross Abstract: This submission to the binary AI detection task is based on a modular stylometric pipeline, where: public spaCy models are used for text preprocessing (including tokenisation, named entity recognition, dependency parsing, part-of-speech tagging, and morphology annotation) and extracting several thousand features (frequencies of n-grams of the above linguistic annotations); light-gradient boosting machines are used as the classifier. We collect a large corpus of more than 500 000 machine-generated texts for the classifier's training. We explore several parameter options to increase the classifier's capacity and take advantage of that training set. Our approach follows the non-neural, computationally inexpensive but explainable approach found effective previously.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI文本检测 模块化语料库 spaCy模型 轻梯度提升机 机器学习
相关文章