Last Updated on June 3, 2024 by Editorial Team Author(s): Louis-François Bouchard Originally published on Towards AI. Watch the video! If you are coding with ChatGPT or Copilot, you may be creating some terrible security leaks! A recent Stanford study found that in 4 out of 5 tasks, participants assisted by AI wrote less secure code than those without AI. This is an 80% increase compared to coders without generative AI assistance with the same coding experience. Worst, they were significantly more likely to overestimate the security of their code, highlighting a 3.5-fold increase in false confidence about code security. These security leaks were mostly authentication mistakes, SQL injections, increased buffer overflows, and symlink vulnerabilities that could be used to make the program crash, execute arbitrary code, or trick a program into reading or writing to an unintended location. Let’s see how to avoid those threats and still profit from the gains of using generative AI when coding. But first, let me address one point some of you might still be wondering: where does Gen AI in coding come from, and do the benefits really outweigh the risks? Why are AIs good to code? What exists? (ChatGPT, github copilot, others…) AI in coding has evolved from simple autocompleters to sophisticated code-generation tools. Initially, AI provided basic syntax suggestions, but advancements in machine learning led to tools like GitHub Copilot, powered by OpenAI Codex. Released in 2021, Codex can generate entire functions and translate natural language into code across various languages, significantly enhancing developer productivity. We can now automate repetitive tasks, perform real-time code analysis, and suggest improvements, helping developers focus on complex problem-solving while maintaining high code quality (The GitHub Blog) (GitHub Resources) (OpenAI). Much more is coming thanks to Agents and complex systems with full access to your code, the internet and debugging features in IDEs. Why use AI to code? So, as you see, using AI to code is primarily for boosting productivity through the automation of repetitive tasks, like automatically generating easy functions or lines that we know how to do and can ask clearly, but don’t want to manually rewrite for a new project. It’s used to code for us. Which is cool, it democratizes coding by merging the searching and adapting of code examples. This is similar to what developers do on Stack Overflow or GitHub with open-source code, looking for similar problems or bugs and copying them, but more efficiently, as AI tools already adapt the code to your problem and current variables. It even helps you learn new programming languages easily translating your ideas into coding lines for languages you are not familiar with, helping you be proficient in them quite fast, though far from secure. Thanks to that, AI-assisted coding is completely transforming career paths in software development, becoming integral parts of educational curriculums and professional training programs. Just ask any current undergrad software engineer about Copilot or ChatGPT, and you’ll see. The issue is, will it create overconfidence, potentially making you create security leaks? The answer is yes, and for a simple reason: just like open-source code: you did not write it, someone or, in this case, something else did. The risks of coding with AI Such security leaks can happen to everyone, or everything. Understanding the risks associated with coding using AI is the first step to fixing them. You won’t be looking for leaks or bugs if you trust the system completely, like our overconfident participants in the Stanford study. So before diving into solutions, let’s have a quick look at the biggest risks of using generative AI for coding… I just want to quickly thank the sponsor of this article, Sema AI, which we will come back to later on, as they’ve been of great help identifying the various risks of using generative AI in coding that we’ll see now, and the solutions to prevent them. I’ll share more about them in a few minutes. The most obvious one is that using current AI systems to generate code often leads to outdated information, as the system may have been trained with now obsolete data. This is nothing new; the same issues existed using code from Stack Overflow. Even though it’s often quite good, unchecked AI-generated code may lack proper documentation, utilize confusing variable names, and employ suboptimal algorithms and design patterns, which can harm the overall quality of the code. The use of generative AI tools may expose companies to intellectual property (IP) risks, particularly regarding trade secrets and copyright. We see OpenAI and other companies being sued all the time. It can also hurt the company’s technical credibility during mergers, acquisitions, or investments that cannot be overlooked. As part of technical due diligence, the presence of AI-generated code is scrutinized similarly to the use of open-source components, with potential impacts on IP security and commercial viability. If this isn’t done properly, it can hurt the whole company. A last one that will surely apply to you, if it does not already, is developing an overreliance on AI for coding tasks. This will lead to a decrease in a developer’s ability to code independently and innovate creatively, impacting overall programming proficiency. Of course, we’ll always have access to generative AI help from now on, but we still must ensure we understand and follow what’s going on. Otherwise, who’s going to help us debug it? How to mitigate those risks and still use AI for efficiency when coding? The first steps to mitigate all those risks are (1) to keep learning and understanding what’s happening by reviewing and understanding each line of code, written or generated, and (2) to be fully transparent about what’s being generated and not generated when sharing with your colleagues and managers. Transparency will be key to preventing many issues, especially those related to IP or security. Beyond these initial steps, it’s crucial to blend AI-generated code with human oversight. Ask the AI to explain its modifications; take the time to thoroughly understand its […]