Communications of the ACM - Artificial Intelligence 07月22日 01:32
Protect Your Code Against Licensing Risks  
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

随着生成式人工智能(GenAI)编码工具的普及,开发者为追求速度可能忽视了其中隐藏的风险。GenAI工具生成的代码可能包含受限制的开源代码,这在企业并购的尽职调查过程中可能成为障碍。特别是Copyleft等开源许可证,要求衍生作品也必须以相同条款发布,这可能迫使初创企业将其专有产品免费开源,从而影响其盈利能力。研究表明,大多数并购交易中都包含存在许可冲突的开源代码组件。为规避此类风险,企业应建立严格的代码管理流程,识别并追踪代码来源,利用专业工具扫描潜在的许可问题,并将GenAI工具主要用于原型开发而非生产代码。

💡 GenAI编码工具可能引入开源代码许可风险:开发者在追求开发速度时,使用GenAI工具生成的代码可能包含受限制的开源代码,这些代码的许可证条款可能会在企业并购等交易中引发问题,例如要求将专有产品免费开源。

📈 开源代码在并购交易中普遍存在且风险较高:一项研究显示,99%的并购交易中都发现了开源代码,其中85%的交易包含许可冲突的开源代码组件,这可能导致交易延迟甚至失败。

⚖️ Copyleft许可证的特殊要求:Copyleft是一种限制性开源许可证,它要求任何使用其代码的衍生作品也必须以相同的条款发布,这意味着包含Copyleft代码的产品可能无法收取费用。

🛡️ 规避GenAI代码许可风险的策略:企业可以通过制定和执行代码编写流程和政策,明确所有代码的来源,将GenAI工具主要用于原型开发而非生产代码,并使用专业的代码扫描工具(如Black Duck、FOSSA、Synk)来识别和管理许可风险。

In the race to cut the time and expense involved in software development, developers may be endangering future business deals in favor of speed. That’s because programs using code created with generative artificial intelligence (GenAI) tools may include open source code that can create stumbling blocks during a business deal’s due-diligence process.

Many developers form startups for the express purpose of an eventual buy-out from a larger organization. They develop proprietary products based on programs that their developers write using a specific programming language such as Python and C++ and, increasingly, with GenAI coding tools.

Challenges can arise when snippets or lines of code within proprietary products contain open source code, which developers rely on because it is easy to use and free. In fact, a 2025 study by software security vendor Black Duck identified open source code in 99% of mergers and acquisitions transactions audited.

Open source code is, typically, a publicly available source code of software programs that is free for developers to use, modify, and redistribute. However, within the universe of open source code are subsets of code that carry restrictive licenses.

Such restrictive licenses can potentially obligate startups to release their products under open source licenses, which may hinder their ability to charge fees for those products. That Black Duck study also revealed that 85% of merger and acquisition transactions included open source code components with license conflicts. You don’t want your startup to have open source licensing conflicts that could potential delay—or even derail—a potential deal.

“Issues mostly come up in the course of an acquisition, when an acquirer might want to run an open source scan to determine the source of a code within a proprietary product,” said Steve Argentieri, a partner in the Business Law Department of  New York-based legal firm Goodwin Procter LLP. “At this point, open source code is probably presenting more practical deal risk than legal risk. Tools such as Copilot have added additional protections to mitigate legal risks such as copyright risk.”

Fortunately, a startup can take steps to mitigate such licensing risks within GenAI code as its developers create products that may eventually be part of a sale. By crafting and enforcing processes and policies that govern the coding process, a clear line of sight can be created as to the origin of all code, insulating the startup from risk.

Identify Licensing Risk within GenAI-Created Code

Potential acquirers must understand what types of open source code appears within a startup’s product base so they can appropriately manage business risk. While there are other types of business risk involved in open source code, one critical risk is licensing risk, which includes potential legal liabilities, compliance issues, financial cost, and intellectual property risk.

GenAI coding tools, which are trained using large language models (LLMs) and natural language processing, create code in response to prompts from developers. GenAI code can be particularly susceptible to licensing risk because GenAI code tools may strip licensing from code lines and snippets when it is created. That means that developers may not realize the code they are using carries a restrictive—instead of permissive—open source license.

As GenAI tools become more popular with developers, this risk is rising. In fact, GenAI coding tools—including ChatGPT, GitHub Copilot, Cursor, and more—are popular with developers, with more than 76% of developers surveyed by StackOverflow either already using them or planning on using them. In the course of using GenAI coding tools, developers unknowingly may insert snippets or whole pieces of open source code that is under restrictive licenses into their programs.

Copyleft

Copyleft is one type of restrictive open source license. Copyleft is an example of a type of restrictive open source code that requires that any derivative uses of that specific code in other programs or products be made available under the same terms. Those terms require that future users can further copy and change it without charge, which means that proprietary products that charge licensing fees should not include copyleft code.

“Generative AI tools may have been trained on copyleft code or the copyleft code could have been incorrectly copied from somewhere else,” said Argentieri. “That creates a risk that you’re accidentally incorporating copyleft code into your software. When you distribute your product, technically you would have an obligation to make your source code available under that copyleft license.”

The creators of open source code, including types with copyleft, believe that code should be freely available to anyone who wants to use it. That freedom is not just associated with the cost, in that open source is available without charge, but also that users should have freedom to use it without restrictions. In other words, the philosophy of open source and copyleft is that middlemen should not be able to strip away the freedom to use open source code through restrictive licenses that charge fees.

Because of the licensing restrictions that copyleft code carries, organizations need to understand their vulnerabilities in terms of whether copyleft code is somehow included in their proprietary products.

Mitigate Code Licensing Risks

To avoid breaching copyleft licenses—and other types of open source licenses—it is possible to implement policies and procedures governing the use of both GenAI-developed code and open source code in general. Oleh Komenchuk recommends using GenAI coding tools for prototyping, but not for the actual production code in products, said the ML lead and AI Engineer at software development company Uptech.

“AI speeds up our thinking, not our shipping,” Komenchuk said. His organization uses a four-step process to mitigate licensing risk that includes reviewing all AI code, tracking where all code comes from, scanning code with tools designed to detect suspicious code snippets, and limiting use of AI to prototyping. Tools designed to scan AI for licensing violation include Black Duck, FOSSA, and Synk.

Argentieri suggested that startups—and all organizations using AI tools to generate code—develop policies to minimize AI risk. “If you are using third-party AI tools, understand what data is going into those, the rights around that data that is either being input or is being trained on to mitigate AI risk,” he said. “Licensing is just one risk—there are many other risks involved in using AI coding tools and other AI tools including privacy, data security, and potentially leaking personal information into AI models.”

Amy Buttell is a Silver Spring, MD-based technology, legal, and business journalist, content creator, writer, and ghostwriter.


Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

GenAI 开源代码 许可风险 软件开发 并购
相关文章