Communications of the ACM - Artificial Intelligence 2024年12月05日
On Program Synthesis and Large Language Models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了大型语言模型(LLM)如ChatGPT在软件开发领域引发的一些观点,特别是关于编程是否会过时的争论。一些人认为,基于LLM的AI系统将取代传统编程,未来软件将主要通过训练而非编程来构建。文章反驳了这一观点,指出从理论上讲,生成正确程序代码是一个复杂且困难的问题,受限于计算复杂度理论。即使现有的一些程序合成方法也存在局限性,例如近似性、资源消耗过大以及适用范围狭窄等。作者认为,将自然语言(如英语)作为编程语言的设想并非新颖,且由于自然语言的语义模糊性,其在程序分析和验证领域并不适用。虽然LLM在某些情况下可以生成合理的代码,但也存在产生无意义代码的风险,因为它们是基于现有代码库进行训练的。因此,编程的终结尚未到来,计算机科学仍然有其重要的研究价值。

🤔 **大型语言模型(LLM)的兴起引发了关于编程未来走向的讨论,有人认为编程将被AI系统取代,软件将通过训练而非编程来构建。** 这一观点认为,除简单程序外,大部分软件都将由AI系统构成,而这些简单程序也将由AI自动生成,从而导致编程不再必要。

💻 **从计算复杂度理论的角度来看,生成正确程序代码是一个难题。** 例如,判断一个简单类型λ演算中的项是否具有给定类型的问题是PSPACE完备的,这意味着解决该问题的算法可能需要指数级的运行时间。类似地,使用全量化布尔公式(QBF)作为规范语言来生成程序,也面临着PSPACE完备的挑战。

⚠️ **现有的程序合成方法都存在局限性,无法完全解决程序生成问题。** 一些方法是近似的,无法保证生成满足所有规范的程序;另一些方法需要消耗大量资源;还有一些方法只能处理特定的小型语言或狭窄的领域。这些方法从未被认为是“编程终结”的标志,而仅仅是程序开发工具箱中的一个工具。

🗣️ **自然语言(如英语)作为编程语言的设想并非新颖,且存在语义模糊性。** 自然语言的语义复杂性使得其难以用于精确的程序规范和验证。计算机科学领域中,程序分析和验证的研究主要集中在形式化规范语言上,而非自然语言。

💡 **大型语言模型(LLM)虽然在某些情况下可以生成合理的代码,但也存在生成无意义代码的风险。** 因为LLM是基于现有代码库进行训练的,其输出结果受到训练数据的限制,无法保证其生成的代码的正确性和可靠性。

Much has been made of the abilities of the new developments in machine intelligence and in particular of what chatbots such as ChatGPT that are based on large language models (LLMs) are capable of. While these new pieces of software are impressive when it comes to generating text, some people in the computing community take this observation much further and, in my opinion, much too far. They claim programming will be a thing of the past. In a January 2023 Communications column, Matt Welsh put forward this opinion: “Programming will be obsolete. I believe the conventional idea of ‘writing a program’ is headed for extinction, and indeed, for all but very specialized applications, most software, as we know it, will be replaced by AI systems that are trained rather than programmed. In situations where one needs a ‘simple’ program (after all, not everything should require a model of hundreds of billions of parameters running on a cluster of GPUs), those programs will, themselves, be generated by an AI rather than coded by hand.”14 This quote contains two claims. First, that most software in the future that is not “simple” will take the form of AI systems. Secondly, that any software not of this form will be automatically generated. A consequence of this rather sweeping claim is that since there is no need to program, there is no need to study programming or properties of programs. Computer science can finally go away!

The first claim made by Welsh can never be refuted. For what does it mean that a piece of software is “simple”? The phrasing of the claim largely implies any piece of software that does not take the form of an AI system is simple.

I will therefore not concentrate on this but remark that unless one is of the opinion that systems software such as operating systems, game engines, and SaaS platforms constitute “simple software,” the first statement implies that these fundamental examples of software can be replaced by AI systems. This is in itself dubious.

The second claim is of a different nature and concerns the generation of program code. In an earlier opinion piece in Communications, Daniel Yellin addressed the shortcomings of the use of LLMs from the point of view of software development practice.15 In this Opinion column, I will instead focus on the fundamental limitations imposed by theorems from the theory of computation and what they tell us about generating program code from specification. The conclusion is the same: The end of programming is not nigh.

Programming in English?

Around the same time as Welsh, Andrej Karpathy wrote on Twitter: “The hottest new programming language is English.”7 This idea did not arise with any of the current software that uses LLMs. The idea that AI software using LLMs can be used for “programming in a natural language” dates back to at least July 2020. In 2020, when GPT-3 came into existence, Osama Qarem wrote: “Using GPT-3 will still be programming. It will be coding in a language that uses English as its syntax. Let’s consider that for a moment: Human spoken languages are less concise, less clear and more prone to misunderstanding. People will need to practice the right nouns, adjectives, verbs, and so forth to use when talking to GPT-3 to get the outcome they want. You would need to learn to “debug” your English sentences according to how GPT-3 understands it, and not according to what you really meant.”12

But the idea is much older. In the FORTRAN report from 1954,1 the authors claimed that “FORTRAN should virtually eliminate coding and debugging.” As we know, this did not happen. John Backus, the primary author of FORTRAN, instead went on to design a succession of influential programming languages and this work earned him the ACM A.M. Turing Award. A decade later, in 1966, Jean Sammet wrote in Communications that English should be the programming language of the future.11 This also did not happen. Edsger W. Dijkstra considered the idea “foolish.”4

On the other hand, we now see widely published examples of seemingly correct program code generated by English-language queries to ChatGPT and many are impressed by this. One may be tempted to think that in this day and age we are finally witnessing the end of programming and the advent of English (or natural language in general) as the main language for creating programs.

However, generating program code from specifications is not a trivial task. Even simple versions of the approach are intractable in the very precise sense of computational complexity theory if we are concerned about generating code that is correct.

Generating Correct Program Code Is Hard

The problem of generating correct program code from a specification is well known in computer science and in fact it is a central one. It is the problem of program synthesis. Gulwani et al. write6 “Program Synthesis is the task of automatically finding programs from the underlying programming language that satisfy user intent expressed in some form of constraints. Unlike typical compilers that translate a fully specified high-level code to low-level machine representation using a syntax-directed translation, program synthesizers typically perform some form of search over the space of programs to generate a program that is consistent with a variety of constraints (for example, input-output examples, demonstrations, natural language, partial programs, and assertions).”

A lot is already known about the synthesis problem in its various guises, and in particular we know it is hard, where “hard” means computationally hard in the sense of computational complexity theory.

In 1979, Richard Statman proved13 the problem of whether there exists some term in the simply typed λ-calculus that has a given type τ is PSPACE-complete.

The exact resource requirements for decision algorithms for such problems are an open problem in computational complexity theory. However, the general consensus is that a decision algorithm for a PSPACE-complete problem will have a running time exponential in the size of the input. So for even moderate large input, any decision algorithm for a PSPACE-complete problem is likely to be very slow. For a specification S of size n, an algorithm will need the order of 2n steps to produce a program satisfying S, if such a program exists.

One could instead use a form of first-order logic called fully quantified Boolean formulae (QBF) as a specification language and ask if a given formula ϕ has a model. If the logical formula speaks about programs, this model will be a program. However, this problem is also PSPACE-complete, as shown by Meyer.9 A consequence of these theorems in the mathematical theory of computation and many others of the same kind is that it is extremely likely that an algorithm for program synthesis of non-trivial program properties will require an unreasonable amount of resources.

The End Is Not Nigh

It should not come as a surprise that computer science has not given up on program synthesis. Far from it. Gulwani et al.6 give a comprehensive survey of existing approaches to program synthesis.6 All of these approaches have limitations.

Some approaches to program synthesis are approximative. This means that they will sometimes not be able to construct a program that satisfies the full specification given. This is the case for tools for static program analysis—and moreover, such tools will always consider particular program properties (such as non-interference or liveness) only.

Other approaches will sometimes require an unreasonable amount of resources. In some cases the synthesis will require a lot of memory or a lot of time in order to build a program. This is the case for approaches that use SMT solvers and for forms of type-based development such as those that are based on the Hindley-Milner type system.10

Yet other approaches can only handle specifications written in a small language for which synthesis is possible within reasonable resource constraints or within narrow problem domains. This is the case for tools for generating compilers and interpreters, where the use of regular expressions and attribute grammars allows those who design, define and implement programming languages to specify the expected behavior of an interpreter or a compiler.

Not only are all of these approaches specialized but have one important common characteristic: They were never hailed as a sign of “the end of programming” (let alone the end of computer science!) but were rightfully viewed as another important tool in the toolbox of program development. It is unreasonable to expect the generational powers of ChatGPT and similar pieces of software to surpass these limitations, let alone to be able to provide a general tool for program synthesis.

But why is English not the way out? Because we know that existing formal specification languages are (at least up to a choice of syntax) sublanguages of English and probably every other natural language used today (including my native language, Danish), so the problem of program synthesis can be no easier for English (or Danish) than for these quite simple specification languages.

English has the disadvantage of being semantically ambiguous so one has to be extremely careful, as many software developers will know. There is a reason why English-language specifications are not central to the research areas of program analysis and program verification in computer science.

Some may now argue that LLMs can still become a useful tool under certain circumstances. And indeed anyone who has used AI software of this kind in the setting of programming will have seen that sometimes it can produce reasonable code that appears to be correct. At other times, the AI software will spout program code that is meaningless as well as useless.

It is no coincidence that we see this behavior, as the LLMs are trained on existing code. GitHub Copilot is trained on code that appears in public repositories on GitHub.5 This means that whatever code is being generated will reflect the style of coding that the chatbot was exposed to during the training phase—which in itself is worth noting as a limitation of this particular approach.

Even if chatbots should become a useful tool for programmers, their existence cannot mean the end of programming. It may well be the case that this combination of a built-in repository of code examples and the ability to easily retrieve code examples based on it can be used for generating ideas and for forms of “pair programming” as recently studied by Bird et al.2

But this is not program synthesis, rather it gives software using LLMs the role of more modest and possibly useful tools. We are now seeing a flurry of interest in combining traditional approaches to program analysis with LLMs, and if anything, this points to the limitations inherent in both approaches.3,8

In my opinion, the real use of natural language in software development lies in the exploratory dialogue that occurs between software developers and the end users of the software being developed. Here, the goal is to eventually understand the requirements so they can be transformed into running program code that meets the requirements. Here, too, software based on LLMs may have a role as a shared tool for facilitating a dialogue, but whatever this is, programming it is not.

The important problem of program correctness is still outstanding and remains one of the most important problems in computer science.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

编程 大型语言模型 程序合成 计算复杂度 AI
相关文章