The GitHub Blog 02月13日
How GitHub uses CodeQL to secure GitHub
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

GitHub的产品安全工程团队使用CodeQL来保障代码安全。CodeQL是GitHub的静态分析引擎,能够像查询数据库一样查询代码,从而发现漏洞并执行安全编码标准。文章详细介绍了GitHub如何通过自定义查询包、自定义查询和变体分析来使用CodeQL,以检测GitHub特定代码模式和不安全编程实践。通过将查询包发布到GitHub Container Registry (GCR),简化了流程,并解决了直接发布查询文件到代码仓库所带来的问题,如部署流程繁琐、未预编译查询导致分析速度慢等。此外,还分享了管理查询包依赖项、编写单元测试以及配置仓库以使用自定义查询包的最佳实践。

💡GitHub使用CodeQL静态分析引擎来保障代码安全,通过自定义查询包、自定义查询和变体分析等方式,检测漏洞并执行安全编码标准。

📦为了更高效地管理和发布CodeQL查询,GitHub将查询包发布到GitHub Container Registry (GCR),解决了直接发布查询文件到代码仓库所带来的问题,如部署流程繁琐、未预编译查询导致分析速度慢等。

🔒GitHub通过锁定`codeql-pack.lock.yml`文件中的依赖版本,特别是`ruby-all`包的版本,来确保查询的稳定性和避免因CodeQL库API变更导致的CI失败。

🧪GitHub强烈建议为自定义CodeQL查询编写单元测试,并在CI过程中运行这些测试,以便及早发现问题,确保查询的稳定性和可靠性。

GitHub’s Product Security Engineering team writes code and implements tools that help secure the code that powers GitHub. We use GitHub Advanced Security (GHAS) to discover, track, and remediate vulnerabilities and enforce secure coding standards at scale. One tool we rely heavily on to analyze our code at scale is CodeQL.

CodeQL is GitHub’s static analysis engine that powers automated security analyses. You can use it to query code in much the same way you would query a database. It provides a much more robust way to analyze code and uncover problems than an old-fashioned text search through a codebase.

The following post will detail how we use CodeQL to keep GitHub secure and how you can apply these lessons to your own organization. You will learn why and how we use:

Enabling CodeQL at scale

We employ CodeQL in a variety of ways at GitHub.

    Default setup with the default and security-extended query suites
    Default setup with the default and security-extended query suites meets the needs of the vast majority of our over 10,000 repositories. With these settings, pull requests automatically get a security review from CodeQL. Advanced setup with a custom query pack
    A few repositories, like our large Ruby monolith, need extra special attention, so we use advanced setup with a query pack containing custom queries to really tailor to our needs. Multi-repository variant analysis (MRVA)
    To conduct variant analysis and quick auditing, we use MRVA. We also write custom CodeQL queries to detect code patterns that are either specific to GitHub’s codebases or patterns we want a security engineer to manually review.

The specific custom Actions workflow step we use on our monolith is pretty simple. It looks like this:

- name: Initialize CodeQL    uses: github/codeql-action/init@v3    with:      languages: ${{ matrix.language }}      config-file: ./.github/codeql/${{ matrix.language }}/codeql-config.yml

Our Ruby configuration is pretty standard, but advanced setup offers a variety of configuration options using custom configuration files. The interesting part is the packs option, which is how we enable our custom query pack as part of the CodeQL analysis. This pack contains a collection of CodeQL queries we have written for Ruby, specifically for the GitHub codebase.

So, let’s dive deeper into why we did that—and how!

Publishing our CodeQL query pack

Initially, we published CodeQL query files directly to the GitHub monolith repository, but we moved away from this approach for several reasons:

By switching to publishing a query pack to GitHub Container Registry (GCR), we’ve simplified our process and eliminated many of these pain points, making it easier to ship and maintain our CodeQL queries. So while it’s possible to deploy custom CodeQL query files directly to a repository, we recommend publishing CodeQL queries as a query pack to the GCR for easier deployment and faster iteration.

Creating our query pack

When setting up our custom query pack, we faced several considerations, particularly around managing dependencies like the ruby-all package.

To ensure our custom queries remain maintainable and concise, we extend classes from the default query suite, such as the ruby-all library. This allows us to leverage existing functionality rather than reinventing the wheel, keeping our queries concise and maintainable. However, changes to the CodeQL library API can introduce breaking changes, potentially deprecating our queries or causing errors. Since CodeQL runs as part of our CI, we wanted to minimize the chance of this happening, as this can lead to frustration and loss of trust from developers.

We develop our queries against the latest version of the ruby-all package, ensuring we’re always working with the most up-to-date functionality. To mitigate the risk of breaking changes affecting CI, we pin the ruby-all version when we’re ready to release, locking it in the codeql-pack.lock.yml file. This guarantees that when our queries are deployed, they will run with the specific version of ruby-all we’ve tested, avoiding potential issues from unintentional updates.

Here’s how we manage this setup:

This approach allows us to balance developing against the latest features of the ruby-all package while ensuring stability when we release.

We also have a set of CodeQL unit tests that exercise our queries against sample code snippets, which helps us quickly determine if any query will cause errors before we publish our pack. These tests are run as part of the CI process in our query pack repository, providing an early check for issues. We strongly recommend writing unit tests for your custom CodeQL queries to ensure stability and reliability.

Altogether, the basic flow for releasing new CodeQL queries via our pack is as follows:

We have found this flow balances our team’s development experience while ensuring stability in our published query pack.

Configuring our repository to use our custom query pack

We won’t provide a general recommendation on configuration here, given that it ultimately depends on how your organization deploys code. We opted against locking our pack to a particular version in our CodeQL configuration file (see above). Instead, we chose to manage our versioning by publishing the CodeQL package in GCR. This results in the GitHub monolith retrieving the latest published version of the query pack. To roll back changes, we simply have to republish the package. In one instance, we released a query that had a high number of false positives and we were able to publish a new version of the pack that removed that query in less than 15 minutes. This is faster than the time it would have taken us to merge a pull request on the monolith repository to roll back the version in the CodeQL configuration file.

One of the problems we encountered with publishing the query pack in GCR was how to easily make the package available to multiple repositories within our enterprise. There are several approaches we explored.

CodeQL query pack queries

We write a variety of custom queries to be used in our custom query packs. These cover GitHub-specific patterns that aren’t included in the default CodeQL query pack. This allows us to tailor the analysis to patterns and preferences that are specific to our company and codebase. Some of the types of things we alert on using our custom query pack include:

Custom queries can be used more for educational purposes rather than being blockers to shipping code. For example, we want to alert engineers when they use the ActiveRecord::decrypt method. This method should generally not be used in production code, as it will cause an encrypted column to become decrypted. We use the recommendation severity in the query metadata so these alerts are treated as more of an informational alert. That means this may trigger an alert in a pull request, but it won’t cause the CodeQL CI job to fail. We use this lower severity level to allow engineers to assess the impact of new queries without immediate blocking. Additionally, this alert level isn’t tracked through our Fundamentals program, meaning it doesn’t require immediate action, reflecting the query’s maturity as we continue to refine its relevance and risk assessment.

/** * @id rb/github/use-of-activerecord-decrypt * @description Do not use the .decrypt method on AR models, this will decrypt all encrypted attributes and save * them unencrypted, effectively undoing encryption and possibly making the attributes inaccessible. * If you need to access the unencrypted value of any attribute, you can do so by calling my_model.attribute_name. * @kind problem * @severity recommendation * @name Use of ActiveRecord decrypt method * @tags security *      github-internal */import rubyimport DataFlowimport codeql.ruby.DataFlowimport codeql.ruby.frameworks.ActiveRecord/** Match against .decrypt method calls where the receiver may be an ActiveRecord object */class ActiveRecordDecryptMethodCall extends ActiveRecordInstanceMethodCall {  ActiveRecordDecryptMethodCall() { this.getMethodName() = "decrypt" }}from ActiveRecordDecryptMethodCall callselect call,  "Do not use the .decrypt method on AR models, this will decrypt all encrypted attributes and save them unencrypted.

Another educational query is the one mentioned above in which we detect the absence of the `control_access` method in a class that defines a REST API endpoint. If a pull request introduces a new endpoint without `control_access`, a comment will appear on the pull request saying that the `control_access` method wasn’t found and it’s a requirement for REST API endpoints. This will notify the reviewer of a potential issue and prompt the developer to fix it.

/** * @id rb/github/api-control-access * @name Rest API Without 'control_access' * @description All REST API endpoints must call the 'control_access' method, to ensure that only specified actor types are able to access the given endpoint. * @kind problem * @tags security * github-internal * @precision high * @problem.severity recommendation */import codeql.ruby.ASTimport codeql.ruby.DataFlowimport codeql.ruby.TaintTrackingimport codeql.ruby.ApiGraphs// Api::App REST API endpoints should generally call the control_access methodprivate DataFlow::ModuleNode appModule() {  result = API::getTopLevelMember("Api").getMember("App").getADescendentModule() and  not result = protectedApiModule() and  not result = staffAppApiModule()}// Api::Admin, Api::Staff, Api::Internal, and Api::ThirdParty REST API endpoints do not need to call the control_access methodprivate DataFlow::ModuleNode protectedApiModule() {  result =    API::getTopLevelMember(["Api"])        .getMember(["Admin", "Staff", "Internal", "ThirdParty"])        .getADescendentModule()}// Api::Staff::App REST API endpoints do not need to call the control_access methodprivate DataFlow::ModuleNode staffAppApiModule() {  result =    API::getTopLevelMember(["Api"]).getMember("Staff").getMember("App").getADescendentModule()}private class ApiRouteWithoutControlAccess extends DataFlow::CallNode {  ApiRouteWithoutControlAccess() {    this = appModule().getAModuleLevelCall(["get", "post", "delete", "patch", "put"]) and    not performsAccessControl(this.getBlock())  }}predicate performsAccessControl(DataFlow::BlockNode blocknode) {  accessControlCalled(blocknode.asExpr().getExpr())}predicate accessControlCalled(Block block) {  // the method `control_access` is called somewhere inside `block`  block.getAStmt().getAChild*().(MethodCall).getMethodName() = "control_access"}from ApiRouteWithoutControlAccess apiselect api.getLocation(),  "The control_access method was not detected in this REST API endpoint. All REST API endpoints must call this method to ensure that the endpoint is only accessible to the specified actor types."

Variant analysis

Variant analysis (VA) refers to the process of searching for variants of security vulnerabilities. This is particularly useful when we’re responding to a bug bounty submission or a security incident. We use a combination of tools to do this, including GitHub’s code search functionality, custom scripts, and CodeQL. We will often start by using code search to find patterns similar to the one that caused a particular vulnerability across numerous repositories. This is sometimes not good enough, as code search is not semantically aware, meaning that it cannot determine whether a given variable is an Active Record object or whether it is being used in an `if` expression. To answer those types of questions we turn to CodeQL.

When we write CodeQL queries for variant analysis we are much less concerned about false positives, since the goal is to provide results for security engineers to analyze. The quality of the code is also not quite as important, as these queries will only be used for the duration of the VA effort. Some of the types of things we use CodeQL for during VAs are:

One recent example involved a subtle vulnerability in Rails. We wanted to detect when the following condition was present in our code:

The concern with this condition is that it could lead to an insecure direct object reference (IDOR) vulnerability because Active Record finder methods can accept an array. If the code looks up an Active Record object in one call to determine if a given entity has access to a resource, but later uses a different element from that array to find an object reference, that can lead to an IDOR vulnerability. It would be difficult to write a query to detect all vulnerable instances of this pattern, but we were able to write a query that found potential vulnerabilities that gave us a list of code paths to manually analyze. We ran the query against a large number of our Ruby codebases using CodeQL’s MRVA.

The query, which is a bit hacky and not quite production grade, is below:

/** * @name wip array query * @description an array is passed to an AR finder object */import rubyimport codeql.ruby.ASTimport codeql.ruby.ApiGraphsimport codeql.ruby.frameworks.Railsimport codeql.ruby.frameworks.ActiveRecordimport codeql.ruby.frameworks.ActionControllerimport codeql.ruby.DataFlowimport codeql.ruby.Frameworksimport codeql.ruby.TaintTracking// Gets the "final" receiver in a chain of method calls.// For example, in `Foo.bar`, this would give the `Foo` access, and in// `foo.bar.baz("arg")` it would give the `foo` variable accessprivate Expr getUltimateReceiver(MethodCall call) {  exists(Expr recv |    recv = call.getReceiver() and    (      result = getUltimateReceiver(recv)      or      not recv instanceof MethodCall and result = recv    )  )}// Names of class methods on ActiveRecord models that may return one or more// instances of that model. This also includes the `initialize` method.// See https://api.rubyonrails.org/classes/ActiveRecord/FinderMethods.htmlprivate string staticFinderMethodName() {  exists(string baseName |    baseName = ["find_by", "find_or_create_by", "find_or_initialize_by", "where"] and    result = baseName + ["", "!"]  )  // or  // result = ["new", "create"]}private class ActiveRecordModelFinderCall extends ActiveRecordModelInstantiation, DataFlow::CallNode{  private ActiveRecordModelClass cls;  ActiveRecordModelFinderCall() {    exists(MethodCall call, Expr recv |      call = this.asExpr().getExpr() and      recv = getUltimateReceiver(call) and      (        // The receiver refers to an `ActiveRecordModelClass` by name        recv.(ConstantReadAccess).getAQualifiedName() = cls.getAQualifiedName()        or        // The receiver is self, and the call is within a singleton method of        // the `ActiveRecordModelClass`        recv instanceof SelfVariableAccess and        exists(SingletonMethod callScope |          callScope = call.getCfgScope() and          callScope = cls.getAMethod()        )      ) and      (        call.getMethodName() = staticFinderMethodName()        or        // dynamically generated finder methods        call.getMethodName().indexOf("find_by_") = 0      )    )  }  final override ActiveRecordModelClass getClass() { result = cls }}class FinderCallArgument extends DataFlow::Node {  private ActiveRecordModelFinderCall finderCallNode;  FinderCallArgument() { this = finderCallNode.getArgument(_) }}class ParamsHashReference extends DataFlow::CallNode {  private Rails::ParamsCall params;  // TODO: only direct element references against `params` calls are considered  ParamsHashReference() { this.getReceiver().asExpr().getExpr() = params }  string getArgString() {    result = this.getArgument(0).asExpr().getConstantValue().getStringlikeValue()  }}class ArrayPassedToActiveRecordFinder extends TaintTracking::Configuration {  ArrayPassedToActiveRecordFinder() { this = "ArrayPassedToActiveRecordFinder" }  override predicate isSource(DataFlow::Node source) { source instanceof ParamsHashReference }  override predicate isSink(DataFlow::Node sink) {    sink instanceof FinderCallArgument  }  string getParamsArg(DataFlow::CallNode paramsCall) {    result = paramsCall.getArgument(0).asExpr().getConstantValue().getStringlikeValue()  }  // this doesn't check for anything fancy like whether it's reuse in a if/else  // only intended for quick manual audit filtering of interesting candidates  // so remains fairly broad to not induce false negatives  predicate paramsUsedAfterLookups(DataFlow::Node source) {    exists(DataFlow::CallNode y | y instanceof ParamsHashReference    and source.getEnclosingMethod() = y.getEnclosingMethod()    and source != y    and getParamsArg(source) = getParamsArg(y)    // we only care if it's used again AFTER an object lookup    and y.getLocation().getStartLine() > source.getLocation().getStartLine())  }}from ArrayPassedToActiveRecordFinder config, DataFlow::Node source, DataFlow::Node sinkwhere config.hasFlow(source, sink) and config.paramsUsedAfterLookups(source)select source, sink.getLocation()

Conclusion

CodeQL can be very useful for product security engineering teams to detect and prevent vulnerabilities at scale. We use a combination of queries that run in CI using our query pack and one-off queries run through MRVA to find potential vulnerabilities and communicate them to engineers. CodeQL isn’t only useful for finding security vulnerabilities, though; it is also useful for detecting the presence or absence of security controls that are defined in code. This saves our security team time by surfacing certain security problems automatically, and saves our engineers time by detecting them earlier in the development process.

Writing custom CodeQL queries

Tips for getting started

We have a large number of articles and resources for writing custom CodeQL queries. If you haven’t written custom CodeQL queries before, here are some resources to help get you started:

Improve the security of your applications today by enabling CodeQL for free on your public repositories, or try GitHub Advanced Security for your organization.

Michael Recachinas, GitHub Staff Security Engineer, also contributed to this blog post.

The post How GitHub uses CodeQL to secure GitHub appeared first on The GitHub Blog.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

CodeQL GitHub安全 静态分析 代码安全
相关文章