The GitHub Blog 04月25日 00:08
A guide to deciding what AI model to use in GitHub Copilot
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

GitHub Copilot不断增加对新模型的支持,但选择哪个模型是个难题。本文提供了一个框架,帮助开发者评估AI模型是否适合他们的需求。开发者可以在Copilot中轻松切换模型,针对不同用例使用不同模型。通常,开发者倾向于使用快速响应的自动完成模型,并在聊天中使用延迟容忍度更高的模型。推理模型在处理复杂任务时更有效,但响应速度较慢。评估新模型时,需要考虑数据的最新程度、速度、响应能力和准确性。最佳评估方法是实际使用,例如从简单的应用程序开始,或者直接在日常工作流程中使用。

💡开发者可以根据不同的使用场景选择不同的模型。例如,对于自动完成功能,开发者通常偏好快速响应的模型;而对于聊天功能,他们可以容忍更高的延迟。

🤔推理模型在处理复杂的编程任务时表现更佳。这类模型能够将问题分解成多个部分,并考虑多种解决方案,从而提高完成复杂任务的效率。但由于其多步骤处理过程,响应速度可能较慢。

✅ 评估新AI模型时,需要关注几个关键因素。包括模型训练数据的时效性、响应速度以及代码的准确性。通过检查模型建议的版本号,可以评估其数据的新旧程度。而代码的结构、模式、注释以及是否遵循最佳实践,都是衡量代码质量的重要指标。

🛠️ 评估AI模型的最佳方法是实际应用。可以从一个简单的应用程序开始,或者直接在日常工作流程中使用一段时间。通过观察模型在实际工作中的表现,例如调试速度或代码重构的质量,来判断其是否真的能提升工作效率。

To ensure that you have access to the best technology available, we’re continuously adding support for new models to GitHub Copilot. That being said, we know it can be hard to keep up with so many new models being released all the time.

All of this raises an obvious question: Which model should you use?

You can read our recent blog post for an overview of the models currently available in Copilot and their strengths, or check out our documentation for a deep dive comparing different models and tasks. But the AI landscape moves quickly. In this article we’ll explore a framework—including a few strategies—for evaluating whether any given AI model is a good fit for your use, even as new models continue to appear at a rapid pace.

It’s hard to go wrong with our base model, which has been fine-tuned specifically for programming-related tasks. But depending on what you’re working on, you likely have varying needs and preferences. There’s no single “best” model. Some may favor a more verbose model for chat, while others prefer a terse one, for example.

We spoke with several developers about their model selection process. Keep reading to discover how to apply their strategies to your own needs.

💡 Watch the video below for tips on prompt engineering to get the best results.

Why use multiple models?

There’s no reason you have to pick one model and stick with it. Since you can easily switch between models for both chat and code completion with GitHub Copilot, you can use different models for different use cases.

It's kind of like dogfooding your own stack: You won’t know if it really fits your workflow until you've shipped some real code with it.

- Anand Chowdhary, FirstQuadrant CTO and co-founder

Chat vs. code completion

Using one model for chat and another for autocomplete is one of the most common patterns we see among developers. Generally, developers prefer autocompletion models because they’re fast and responsive, which they need if they’re looking for suggestions as they think and type. Developers are more tolerant of latency in chat, when they’re in more of an exploratory state of mind (like considering a complex refactoring job, for instance).

Reasoning models for certain programming tasks

Reasoning models like OpenAI o1 often respond slower than traditional LLMs such as GPT-4o or Claude Sonnet 3.5. That’s in large part because these models break a prompt down into parts and consider multiple approaches to a problem. That introduces latency in their response times, but makes them more effective at completing complex tasks. Many developers prefer these more deliberative models for particular tasks.

For instance, Fatih Kadir Akın, a developer relations manager, uses o1 when starting new projects from scratch. “Reasoning models better ‘understand’ my vision and create more structured projects than non-reasoning models,” he explains.

FirstQuadrant CTO and co-founder Anand Chowdhary favors reasoning models for large-scale code refactoring jobs. “A model that rewrites complex backend code without careful reasoning is rarely accurate the first time,” he says. “Seeing the thought process also helps me understand the changes.”

When creating technical interview questions for her newsletter, GitHub Senior Director of Developer Advocacy, Cassidy Williams mixes models for certain tasks. When she writes a question, she uses GPT-4o to refine the prose, and then Claude 3.7 Sonnet Thinking to verify code accuracy. “Reasoning models help ensure technical correctness because of their multi-step process,” she says. “If they initially get something wrong, they often correct themselves in later steps so the final answer is more accurate.”

There’s some subjectivity, but I compare model output based on the code structure, patterns, comments, and adherence to best practices.

- Portilla Edo, cloud infrastructure engineering lead

What to look for in a new AI model

Let’s say a new model just dropped and you’re ready to try it out. Here are a few things to consider before making it your new go-to.

Recentness

Different models use different training data. That means one model might have more recent data than another, and therefore might be trained on new versions of the programming languages, frameworks, and libraries you use.

“When I’m trying out a new model, one of the first things I do is check how up to date it is,” says Xavier Portilla Edo, a cloud infrastructure engineering lead. He typically does this by creating a project manifest file for the project to see what version numbers Copilot autocomplete suggests. “If the versions are quite old, I’ll move on,” he says.

Speed and responsiveness

As mentioned, developers tend to tolerate more latency in a chat than in autocomplete. But responsiveness is still important in chat. “I enjoy bouncing ideas off a model and getting feedback,” says Rishab Kumar, a staff developer evangelist at Twilio. “For that type of interaction, I need fast responses so I can stay in the flow.”

Accuracy

Naturally, you need to evaluate which models produce the best code. “There’s some subjectivity, but I compare model output based on the code structure, patterns, comments, and adherence to best practices,” Portilla Edo says. “I also look at how readable and maintainable the code is—does it follow naming conventions? Is it modular? Are the comments helpful or just restating what the code does? These are all signals of quality that go beyond whether the code simply runs.”

How to test an AI model in your workflow

OK, so now you know what to look for in a model. But how do you actually evaluate it for responsiveness and correctness? You use it, of course.

Start with a simple app

Akın will generally start with a simple todo app written in vanilla JavaScript. “I just check the code, and how well it’s structured,” he says. Similarly, Kumar will start with a websocket server in Python. The idea is to start with something that you understand well enough to evaluate, and then layer on more complexity. “Eventually I’ll see if it can build something in 3D using 3js,” Akın says.

Portilla Edo starts by prompting a new model he wants to evaluate in Copilot Chat. “I usually ask it for simple things, like a function in Go, or a simple HTML file,” he says. Then he moves on to autocompletion to see how the model performs there.

Use it as a “daily driver” for a while

Chowdhary prefers to just jump in and start using a model. “When a new model drops, I swap it into my workflow as my daily driver and just live with it for a bit,” he says. “Available benchmarks and tests only tell you part of the story. I think the real test is seeing if it actually improves your day to day.”

For example, he checks to see if it actually speeds up his debugging jobs or produces cleaner refactors. “It’s kind of like dogfooding your own stack: You won’t know if it really fits your workflow until you’ve shipped some real code with it,” he says. “After evaluating it for a bit, I decide whether to stick with the new model or revert to my previous choice.”

Take this with you

What just about everyone agrees on is that the best way to evaluate a model is to use it.

The important thing is to keep learning. “You don’t need to be switching models all the time, but it’s important to know what’s going on,” Chowdhary says. “The state of the art is moving quickly. It’s easy to get left behind.”

Additional resources

Learn more about AI models.

The post A guide to deciding what AI model to use in GitHub Copilot appeared first on The GitHub Blog.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

GitHub Copilot AI模型选择 代码自动完成 推理模型 模型评估
相关文章