Communications of the ACM - Artificial Intelligence 前天 01:37
How Generative Models are Ruining Themselves
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

随着生成式AI的广泛应用,其生成内容的质量可能面临下降。文章指出,AI生成的内容将更多地依赖于人工和通用数据,导致细节如对比度和边缘的质量降低。此外,AI生成的文本可能重复且缺乏创意。由于互联网数据量每三年翻一番,AI生成内容的激增可能反过来影响其自身质量。文章探讨了这种现象的原因和潜在影响,并呼吁提高对生成式AI的意识和负责任的使用,以应对数据污染的挑战,确保AI技术的长期发展。

🤖 AI生成内容的质量可能下降,因为其依赖于人工和通用数据,导致细节质量降低,例如对比度和边缘的模糊。

🔄 AI生成的文本可能重复且缺乏创意,导致内容同质化,这与人类创作的独特性形成对比。

⚠️ AI模型基于互联网数据进行训练,而这些数据可能包含不真实或AI生成的内容,这种“数据污染”会影响AI模型的未来结果。

🧐 AI模型在生成图像细节方面存在局限性,例如在绘制手部或面部特征时容易出现错误,这反映了它们在理解和再现复杂细节方面的困难。

💡 解决之道在于提高对AI的意识和负责任的使用,例如对AI生成的内容进行严格的审核和优化,从而减缓其质量下降的趋势。

I argue that with the increased use of generative AI, there will be a decrease in the quality of the generated content because this generated content will be more and more based on artificial and general data.

For instance, automatically generating a new picture will be based on original images authentically generated by persons (e.g., photographers) plus machine-generated images; however, the latter are not as good as the former in terms of details like contrast and edges. Besides, AI-generated text will be based on original creative content by real persons ‘plus’ machine-generated text, where the latter might be repetitive and standard. Since data generated globally is almost doubling every three years1, in years to come humanity will produce more data than it has ever created, therefore if the Internet becomes overloaded with AI-generated stuff, then that stuff will affect its (the AI’s) outcome negatively.

AI generative models are trained using Internet data (e.g. websites, curated content, forums, social media). People’s interactions with that data by reacting to it, reposting, or endorsing it, will enrich a profusion of unreliable content due to the fact that the origin of such content was unoriginal and AI-generated. Plus, those interactions will be included in future training sets. Those facts will unfavorably influence the results of generative models in the future. Why and how could this happen? And what can we do about it?

Consider, for example, asking an AI generative model to create an image of the Last Supper. It will successfully do it based on previously encountered paintings of the Last Supper by classical painters. Nonetheless, if we look into the details of any such generated images, we can easily detect discrepancies, specifically in the drawing of hands, fingers, ears, teeth, pupils, and/or other specific tiny prominent details in the foreground, and sometimes in the background. Those details are difficult to realize even by proficient artists2. Thus, imagine if AI systems are faced with more and more images (photos or paintings) containing unrealistic tiny details due to the difficulty of creating such details or by being filtered or generated using AI, then they will generate outcomes with obvious unrealistic details. This is because generative models are based on Artificial Neural Networks (ANNs) that are essentially function approximators3. In other words, they are always trying to provide an output based on generalizations they learned from historical inputs. But, this history is continually jeopardized with discrepancies. Better put, generative models are trying to depict reality, but embed glitches from their own inherited generated content. While doing so, their inability to discriminate between efficient and inefficient content makes me argue that they will be un-deliberately ruining themselves in the long run.

As previously argued4, generative models are statistical models lacking creative reasoning capabilities or emergent behaviors. Besides, experiments were done such that the output of an AI system was fed back as its input; after many runs, the system output becomes gibberish5. In addition, generative models are known to produce emotionless6, neutral7, low-perplexity8, and tedious content9. Also, according to the adage ‘Garbage In Garbage Out’ (GIGO)10, the quality of any computing system output is subject to its inputs10, hence if the system is evolving and learning from less-elegant data, then it will result in less-elegant data. Consequently, the proliferation of trivial generative content by AI models will soon create more boring, emotionless, un-objective results, flawed with discrepancies and unrealistic details. As I already highlighted, ANNs are prone to inputs and ‘perfect’ in generalizations, thus, through their own generative capabilities, they will be negatively mutating the outcomes they will be offering while endorsing impurities from generation to generation (i.e., version updates and training).

One could argue that generative models are well-suited to providing outstanding results in domains such as law exams for instance, but it should be noted that this is a narrowed domain of application which is way less in its effect when compared to their applicability on a wide spread of knowledge that they will provide or assist in its generation in the public and private domains. It should also be noted that narrowed-down applications of generative models in specific domains might be useful, but here I am addressing the global impact of such models and their own deterioration in a general and long-term future endeavor. In this regard, the ultimate way to contain such data poisoning (i.e., flooding the Internet with degenerate content) should be through awareness and responsible use of generative models. For instance, AI-generated content should not be rushed to be posted online, should be very well refined and, even better, checked or enhanced by experts.

Penrose11, whilst criticizing AI based on classical computation, was also positive for future technological advancements of AI that would enhance its capabilities11. Similarly, here, I am criticizing AI based on the current available technologies (e.g., generative models). If, in the future, a different technology takes the stand, then this might alter my critique.

I conclude with the following challenge for generative models or any future technology: Learning the Mandelbrot set12 image; an ANN that learns from all Mandelbrot set images available on the Internet will never be able to grasp the complex dynamics behind the countless affinities and similarities that are available in the set12. In fact, it will provide very similar images of the set on a wider scale, but will be short on the details (i.e., the periphery will appear blurred and pixelated when zoomed in, but on a true Mandelbrot set, the periphery is always refined). So, is it possible for a machine, one day, to create, understand, and look at something similar to the Mandelbrot Set, or the Mandelbrot Set itself, the same as Benoit Mandelbrot did and had intuition of, or the way anyone of us feels towards its mathematical beauty?

References

Mario Antoine Aoun is an ACM Professional member who has been a Reviewer for ACM Computing Reviews since 2006. He has more than 25 years of computer programming experience and holds a Ph.D. in Cognitive Informatics from the Université du Québec à Montréal. His main research interest is memory modeling based on chaos theory and spiking neurons.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

生成式AI 内容质量 数据污染 AI未来
相关文章