CodeQL静态代码扫描之实现关联接口、入参、和危险方法并自动化构造payload及抽象类探究

author：平安银行应用安全团队@xsser

大背景

在CodeQL出来之前，我们拥有的静态代码扫描工具常见为：fortify，checkmarks，sonar，findbugs，pmd等，这些工具的特色有一些是基于静态的ast结构，例如pmd，纯静态的ast解析的工具存在误报多，无法追踪漏洞等致命的问题，而findbugs之类存在堆栈跟踪有限的情况，checkmarks的数据流又存在跟踪不全无法定义构造特定情况下的数据流的情况，重点是还要钱！

在这个情况下，CodeQL在2019年年底，即疫情爆发之际由微软发布了此工具。经过数月的研究发现，这个工具功能异常强大，其底层的图数据库非常好，可以编写自己描述的规则来寻找对应的数据流从而实现漏洞定位，甚至数据安全的一些实践。故把研究成果分享给大家，希望大家可以从中受益！

实现关联接口、入参、和危险方法并自动化构造payload

CodeQL介绍

CodeQL是一个史诗般质的跨越的工具，在这个之前，大家公认的顶峰可能是fortify。

QL是一种查询语言，支持对C++，C#，Java，JavaScript，Python，go等多种语言进行分析，可用于分析代码，查找代码中控制流等信息。

认识基础模块

首先引入1个模块，这个模块也是CodeQL的核心之一，那就是DataFlow模块。这个模块负责实现代码的数据流跟踪功能，即实现整个的调用栈分析。

这个模块最核心常用的方法就是hasFlowPath方法，它接受2个参数，一个是source，另一个是sink，这个模块是用来判断一个数据流，所以我们可以这样写代码：

from QueryInjectionSink query, DataFlow::PathNode source, DataFlow::PathNode sink 
where queryTaintedBy(query, source, sink)
select
query, source,sink

其中queryTaintedBy是一个“谓词”，通俗的说就是方法。这个方法里我们会去操作数据流的config配置。

按照上述写法就可以查询source到sink的数据流了。但直接这样查当然是不行的，我们还要约束一定的范围和条件，这里的QueryInjectionSink就是这个查询的配置，也就是config,它必须继承于TaintTracking::Configuration，所以我们要建立一个数据类型来表示这个

class QueryInjectionFlowConfig extends TaintTracking::Configuration {
//定义配置 this可以随意写
  QueryInjectionFlowConfig() { this = "SqlInjectionLib::QueryInjectionFlowConfig" }
//定义source的来源，这里是来自RemoteFlowSource，这是一个官方写的class，较为全面的定义了远程可控的来源，但是还是有一些问题。我做了一些补充，以后会提及
  override predicate isSource(DataFlow::Node src) { src instanceof RemoteFlowSource }
//QueryInjectionSink定义了sink的约束条件
  override predicate isSink(DataFlow::Node sink) { sink instanceof QueryInjectionSink }
//过滤条件，这里对node中的数据类型做了判断，如果是一些int long之类的数据类型就会抛弃不会进入数据流判断，这也是CodeQL较为准确的识别SQL注入的原因之一。
  override predicate isSanitizer(DataFlow::Node node) {
    node.getType() instanceof PrimitiveType or
    node.getType() instanceof BoxedType or
    node.getType() instanceof NumberType
  }
}

这里的3个方法，isSource,isSink,isSanitizer，其中前面2个方法是必须要的，最后一个方法是用来设置过滤条件，还有一些可选的方法比如additonalstepxx等。

isSource是来源于RemoteFlowSource，这也是CodeQL解决SQL注入的强大的地方，这个是官方写的一个类，这个类覆盖了大部分的用户可控的来源，比如getParameter().getHost()等，甚至覆盖了Socket来源的，这也使得扫描出远程方法调用的反序列化情况变得更容易，再也不需要一行行靠经验来代码审计了。

然后接下来我们定义个谓词，也就是刚才上文说的queryTaintedBy方法(谓词)，我们这样写

predicate queryTaintedBy(
  QueryInjectionSink query, DataFlow::PathNode source, DataFlow::PathNode sink
) {
  exists(QueryInjectionFlowConfig conf | conf.hasFlowPath(source, sink) and sink.getNode() = query)
}

在这里，我们就可以实现一个查询了，其中conf变量就是用来定义查询配置的，然后使用conf配置去执行查询，也就是hasFlowPath方法。

在这个例子中，还有一个条件sink.getNode()=query，这里的query是一个QueryInjectionSink类型的。这个QueryInjectionSink类型要深入讲解的话大家要去看源代码了，SQL注入的一大部分重点就是在这个QueryInjectionSink类型上，他定义了很多sink的内容，实现了CodeQL对数据库查询的的覆盖。例如jdbc, hibernate等常见的方法，但是sink覆盖的不是很多，这个地方只能手动去加，官方留有余地的只写了3个(我的意思就是这个地方需要)，不过可以覆盖大部分的情况(这就给漏报提供了空间)。

分析需求

接下来我们要实现一个需求，即：识别一个应用的接口和对应的参数和对应的sink。(先只考虑Spring框架下)

然后我们来切割这个需求的操作过程，首先我们要操作source和sink，这个是必须的，接口的参数和路径通过source获得，对sink操作不大。然后我们会用到annotation这个类，这个类可以返回对应的注解给我们。获得source后，我们还要去识别对应接口接受参数的方法是post还是get，这个也很好办，我们获得参数的注解就完事了。如果是RequestParam那就认为是get，如果是其他的就认为是post，我这里比较粗暴和粗糙的去判断了，总体就是看一个思路。

实现过程

接下来我们开始定义个简单的谓词

string methodIsPostOrGet(DataFlow::PathNode source){
    if source.getNode().asParameter().getAnAnnotation().toString().regexpMatch("RequestParam")
    then result = "get"
    else result = "post"
   }

这是一个带返回结果的谓词，我们要定义它的类型。其中result是CodeQL自带的一个变量，用来返回这个方法的结果。

我们依然要用到DataFlow。

from QueryInjectionSink query, DataFlow::PathNode source, DataFlow::PathNode sink 
where queryTaintedBy(query, source, sink)
select
query, "接口地址:",
//获得source的注解的值，Spring中为接口地址
source
    .getNode()
    .asParameter()
    .getCallable()
    .getAnAnnotation()
    .getValue("value"),
"接口参数:",
//获得接口的参数名
source
    .getNode()
    .asParameter()
    .getAnAnnotation()
.getValue("value"),
//获得sink所在的文件位置
"文件地址:",
sink.getNode().asExpr().getLocation(),
//获得source节点的路径的http方法
"请求类型:",
methodIsPostOrGet(sink),
"注解内容为:",
//获得注解
source.getNode().asParameter().getAnAnnotation().toString()

结果

最后实现结果如下

至此，我们大概的实现了一个简单的数据流的元数据提取，我们拿到了几个重要的数据：接口路径，接口参数，方法类型，是否有sink，文件路径。

思考

基于这几个数据，我们可以把其他漏洞类型的ql规则改造成统一的。然后我们每次静态扫描的时候就可以获得这些数据，这样的话，扫描完成我们再去解析对应的json，即可实现扫描完成自动化实现和封装这个漏洞对应的payload。

例如

POST http/1.1
header:xxx
xx


Username=xsser’ and 1=1

抑或是

GET /url/from/Spring?username=xsser%27 and 1=1 http/1.1
heder
xxxx
xxx

当然，有了基础数据，怎么利用就是天马行空的另一个排列组合的思考范畴了，例如你可以弥补一些iast无法定位到来源参数的困难，或者是sink上的不足，或者是无法建立数据流的关联性，比如数据安全中强调的数据生命周期。对于静态代码来说这个比较难追踪或者建立对应的联系，大部分收集到的数据是非关系型数据，但是通过CodeQL这样去实现，我们不仅可以实现代码漏洞的挖掘，顺便可以把资产的元数据提取工作也做了，这些元数据又可以在提供给数据安全的同时用来关联数据的对应关系。

在存量较大的情况，我们可以实现SQL语句的字段对应到controller和参数，当你深入了解了CodeQL，你会发现我说的这些仅仅是冰山一角。

抽象类探究

我在阅读了CodeQL大量插件的代码的情况下发现了一个有趣的结构，这个结构就是抽象方法实例化的时候会自动执行子类的方法，而不需要调用子类的方法。这个结构大量存在于CodeQL的规则中，这里我以CWE-022，TaiantedPath.ql作为例子来讲解

首先这个规则的主体是：

在上一部分的讲解中，我们了解到了数据流必须使用DataFlow::xxx和TaintTracking::configuration，其中conf是对数据流进行配置，而这里，我圈出来的any()方法中的就是对一个sink的约束，他的约束主要在PathCreation类和guarded谓词中。这里重点就是在讲解PathCreation类。

首先它是一个静态类，作者抽象了PathCommon.qll库，在这个库里我们可以看到这个方法的原型(Construtor)，我把它抽象下结构如下：

abstract class PathCreation extends Expr {
  abstract Expr getInput();
}


class PathsGet extends PathCreation, MethodAccess {
  PathsGet() {
    …
  }


  override Expr getInput() { result = this.getAnArgument() }
}


class FileSystemGetPath extends PathCreation, MethodAccess {
  FileSystemGetPath() {
    …
  }


  override Expr getInput() { result = this.getAnArgument() }
}


class FileCreation extends PathCreation, ClassInstanceExpr {
  FileCreation() { …}


  override Expr getInput() {
    result = this.getAnArgument() and
    // Relevant arguments include those that are not a `File`.
    not result.getType() instanceof TypeFile
  }
}


class FileWriterCreation extends PathCreation, ClassInstanceExpr {
  FileWriterCreation() { this.getConstructedType().getQualifiedName() = "java.io.FileWriter" }


  override Expr getInput() {
    result = this.getAnArgument() and
    // Relevant arguments are those of type `String`.
    result.getType() instanceof TypeString
  }
}


predicate inWeakCheck(Expr e) {
  none()
}


// Ignore cases where the variable has been checked somehow,
// but allow some particularly obviously bad cases.
predicate guarded(VarAccess e) {
  none()
  )
}

聪明的你可以发现，下面的子类都是继承于PathCreation类，并且重写了一个getInput方法，这个方法我们可以看到都带有result关键词，并且是一个带类型的返回型谓词。根据官方文档的说明，带result就会把符合条件的集合返回。而且可以看到，返回的谓词中大部分都有this.getAnArgument，这个意思就是返回满足上面的参数集合，他们有一个共性，都extends 了PathCreation类。那时候我就好奇了，这个类全文却没地方调用过，或者实例化。

全文档搜索了也没发现有这个类的实例化。然后我对这个规则简单的调试了下，发现注释掉部分子类的代码，并且执行了PathCreation和PathGet、 FileCreation和FileWriterCreation发现就是这几个子类的总集合。

这是总的集合

这是PathGet

这是FileCreation

到此，我们可以得出一个结论: CodeQL对抽象类的查询会筛选出所有满足子类条件的结果集合，而至于需不需要返回，这个就需要你自己去手动定义个谓词来返回。

到这里我相信你就可以去了解一些必须要理解的类，比如RemoteFlowSource类，这对编写插件如何更好的排版代码编写规则逻辑也很有帮助！

下面是PathsCommon的主体的注释，希望能帮你理解

abstract class PathCreation extends Expr {
    abstract Expr getInput();
  }
  
  class PathsGet extends PathCreation, MethodAccess {
    // 寻找`java.nio.file.Paths`类下的get方法
    PathsGet() {
      exists(Method m | m = this.getMethod() |
        m.getDeclaringType() instanceof TypePaths and
        m.getName() = "get"
      )
    }
    // 返回这个方法的集合
    override Expr getInput() { result = this.getAnArgument() }
  }
  
  class FileSystemGetPath extends PathCreation, MethodAccess {
    // 寻找`java.nio.file.FileSystem`类下的getPath方法并通过getInput方法返回这个集合
    FileSystemGetPath() {
      exists(Method m | m = this.getMethod() |
        m.getDeclaringType() instanceof TypeFileSystem and
        m.getName() = "getPath"
      )
    }
  
    override Expr getInput() { result = this.getAnArgument() }
  }
  
  class FileCreation extends PathCreation, ClassInstanceExpr {
    //   限定实例化的对象的原型在`java.io.File`类下
    // 例如new xxx()  这个xxx必须在`java.io.File`下
    FileCreation() { this.getConstructedType() instanceof TypeFile }
  
    override Expr getInput() {
        // 获得上述实例化的class的参数，并且这个参数的类型必须是file类型的，并返回满足and条件的参数集合
      result = this.getAnArgument() and
      // Relevant arguments include those that are not a `File`.
      not result.getType() instanceof TypeFile
    }
  }
  
  class FileWriterCreation extends PathCreation, ClassInstanceExpr {
    //   限定在`java.io.FileWriter`类下
    FileWriterCreation() { this.getConstructedType().getQualifiedName() = "java.io.FileWriter" }
    // 返回参数类型是String类型的参数
    override Expr getInput() {
      result = this.getAnArgument() and
      // Relevant arguments are those of type `String`.
      result.getType() instanceof TypeString
    }
  }
  
  predicate inWeakCheck(Expr e) {
    // None of these are sufficient to guarantee that a string is safe.
    // 约束一个类下的方法如果是startswith等方法，注意这里的方法是原生的，这里建议扩大覆盖范围，使用matches去匹配类似的方法名
    exists(MethodAccess m, Method def | m.getQualifier() = e and m.getMethod() = def |
      def.getName() = "startsWith" or
      def.getName() = "endsWith" or
      def.getName() = "isEmpty" or
      def.getName() = "equals"
    )
    or
    // Checking against `null` has no bearing on path traversal.
    exists(EqualityTest b | b.getAnOperand() = e | b.getAnOperand() instanceof NullLiteral)
  }
  
  // Ignore cases where the variable has been checked somehow,
  // but allow some particularly obviously bad cases.
  predicate guarded(VarAccess e) {
    //   一个参数必须存在于上面抽象类返回结果的集合中且条件分支为True的情况下的方法，还要不是StartsWith等方法
    exists(PathCreation p | e = p.getInput()) and
    exists(ConditionBlock cb, Expr c |
      cb.getCondition().getAChildExpr*() = c and
      c = e.getVariable().getAnAccess() and
      cb.controls(e.getBasicBlock(), true) and
      // Disallow a few obviously bad checks.
      not inWeakCheck(c)
    )
  }

喜欢请点关注?

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签