Zero-Shot Statistical Tests for LLM-Generated Text Detection using Finite Sample Concentration Inequalities

cs.AI updates on arXiv.org 04月15日

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

本文探讨了识别文本来源的重要性，特别是在大型语言模型（LLM）生成的文本与人类创作文本难以区分的背景下。研究重点是开发零样本统计测试，以区分由不同LLM生成的文本，以及LLM生成的文本与人类生成的文本。研究结果表明，随着文本长度的增加，测试的I型和II型错误率呈指数级下降。研究还通过实验验证了其理论结果，并在黑盒设置下进行了对抗性攻击的实验。这项研究为追踪有害或虚假LLM生成文本的来源提供了保障，有助于打击虚假信息，并符合新兴的AI监管要求。

🕵️‍♀️ 随着大型语言模型（LLM）的出现，区分机器生成的文本和人类创作的文本变得越来越具有挑战性，这给教育机构、社交媒体平台等组织带来了问题。

🔬 研究提出了一种零样本统计测试方法，用于区分由不同LLM生成的文本，以及LLM生成的文本与人类生成的文本。这种方法不需要预先训练或标记数据。

📉 实验结果表明，随着文本长度的增加，测试的I型和II型错误率呈指数级下降。这意味着，随着文本的增加，判断的准确性会越来越高。

💡 研究证明了，如果文本由评估模型A生成，那么该文本在模型A下的log-perplexity会收敛于该文本在模型A下的平均熵。如果文本由模型B生成，那么在模型A下的log-perplexity会收敛于模型B和模型A的平均交叉熵，除了字符串长度中指数级小的概率。

🛡️ 这项研究为追踪有害或虚假LLM生成文本的来源提供了保障，有助于打击虚假信息，并符合新兴的AI监管要求，对维护信息真实性和遵守AI法规具有实际意义。

arXiv:2501.02406v3 Announce Type: replace-cross Abstract: Verifying the provenance of content is crucial to the function of many organizations, e.g., educational institutions, social media platforms, firms, etc. This problem is becoming increasingly challenging as text generated by Large Language Models (LLMs) becomes almost indistinguishable from human-generated content. In addition, many institutions utilize in-house LLMs and want to ensure that external, non-sanctioned LLMs do not produce content within the institution. We answer the following question: Given a piece of text, can we identify whether it was produced by LLM $A$ or $B$ (where $B$ can be a human)? We model LLM-generated text as a sequential stochastic process with complete dependence on history and design zero-shot statistical tests to distinguish between (i) the text generated by two different sets of LLMs $A$ (in-house) and $B$ (non-sanctioned) and also (ii) LLM-generated and human-generated texts. We prove that our tests' type I and type II errors decrease exponentially as text length increases. For designing our tests for a given string, we demonstrate that if the string is generated by the evaluator model $A$, the log-perplexity of the string under $A$ converges to the average entropy of the string under $A$, except with an exponentially small probability in the string length. We also show that if $B$ generates the text, except with an exponentially small probability in string length, the log-perplexity of the string under $A$ converges to the average cross-entropy of $B$ and $A$. For our experiments: First, we present experiments using open-source LLMs to support our theoretical results, and then we provide experiments in a black-box setting with adversarial attacks. Practically, our work enables guaranteed finding of the origin of harmful or false LLM-generated text, which can be useful for combating misinformation and compliance with emerging AI regulations.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签