CERT Recently Published Vulnerability Notes 2024年07月06日
VU#253266: Keras 2 Lambda Layers Allow Arbitrary Code Injection in TensorFlow Models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Keras 2.13 之前的版本中,第三方 TensorFlow 模型的 Lambda 层允许攻击者注入任意代码。攻击者可以利用此漏洞将恶意代码注入流行模型,并将其保存和重新分发,从而污染依赖 AI/ML 应用程序的供应链。

🎯 Keras 2.13 之前的版本存在代码注入漏洞,第三方 TensorFlow 模型的 Lambda 层允许攻击者注入任意代码。 该漏洞允许攻击者将恶意代码注入流行模型,并将其保存和重新分发,从而污染依赖 AI/ML 应用程序的供应链。 攻击者可以通过将恶意代码注入 Lambda 层,在模型加载时执行任意代码。例如,攻击者可以利用此漏洞窃取敏感数据、破坏系统或进行拒绝服务攻击。 该漏洞影响使用 Keras 2.13 之前的版本的 AI/ML 应用程序,包括使用 Keras 构建的模型和使用 Keras 加载的预训练模型。

🛡️ Keras 2.13 及更高版本引入了安全模式,默认情况下禁用 Lambda 层的序列化。 在安全模式下,加载包含 Lambda 层的模型将引发异常。这有助于防止攻击者利用代码注入漏洞。 为了确保安全,建议所有用户升级到 Keras 2.13 或更高版本,并确保在加载模型时启用安全模式。 此外,用户应谨慎使用第三方模型,并确保从可信来源获取模型。

⚠️ 该漏洞与其他框架中的代码注入漏洞类似,例如 Python 中的 Pickle 机制。 这些漏洞的根本原因是,这些框架允许将代码与数据一起打包。 攻击者可以利用这些漏洞将恶意代码注入数据,并在数据加载时执行代码。 为了防止代码注入漏洞,框架开发者应避免使用容易受到攻击的序列化机制,并采取措施限制可执行的代码。

💡 TensorFlow 安全文档明确指出,模型实际上是 TensorFlow 执行的程序,使用不可信的模型或图相当于运行不可信的代码。 但是,并非所有 TensorFlow 开发人员都了解这一点。许多开发人员可能错误地认为,使用可信库加载的模型只会在该库中执行代码。 为了解决这个问题,AI/ML 框架开发者和模型分发者应努力使明确的安全策略与相应的实现一致,并与这些假设所隐含的隐式安全策略保持一致。

🆘 该漏洞可能导致加载第三方模型时执行任意不可信代码,这将影响 ML 应用程序环境的权限级别。 为了解决此漏洞,建议用户升级到 Keras 2.13 或更高版本,并在加载模型时确保安全模式处于启用状态。 如果在沙箱环境中运行 2.13 之前的应用程序,请确保运行应用程序的范围内没有有价值的资产,以最大程度地减少数据外泄的可能性。

🤔 模型用户应仅使用可信来源开发和分发的模型,并在部署前始终验证模型的行为。他们应该遵循与集成 ML 模型的应用程序相同的开发和部署最佳实践,就像他们对包含任何第三方组件的任何应用程序一样。开发人员应升级到最新版本的 Keras 包(v2.13+ 或 v3.0+),并使用 Keras 序列化格式的版本 3 来加载第三方模型并保存任何后续修改。

🤝 模型聚合器应尽可能基于最新的安全模型格式分发模型,并应包含扫描和内省功能来识别包含不安全反序列化功能的模型,并防止它们被上传或标记它们,以便模型用户可以执行额外的尽职调查。

🛠️ 模型创建者应升级到最新版本的 Keras 包(v2.13+ 或 v3.0+)。他们应避免使用不安全的反序列化功能,以避免无意中引入安全漏洞,并鼓励采用不易受恶意行为者利用的标准。模型创建者应使用最新版本的格式(对于 Keras 包,为 Keras v3)保存模型,并且在可能的情况下,优先使用不允许序列化包含任意代码(即用户未明确导入到环境中的代码)的模型的格式。模型开发人员应谨慎地重新使用第三方基本模型,仅在可信来源的模型上进行构建。

💻 AI/ML 框架开发者应避免使用简单的语言本机序列化工具(例如,Python pickle 包具有公认的安全漏洞,不应在敏感应用程序中使用)。 如果需要包含用于嵌入代码的机制,请限制可执行的代码,例如: * 禁止某些语言功能(例如,exec) * 明确仅允许“安全”语言子集 * 提供沙箱机制(例如,防止网络访问)以最大程度地减少潜在威胁。

🙏 感谢 Jeffrey Havrilla、Allen Householder、Andrew Kompanek 和 Ben Koo 为编写本文档做出的贡献。

Overview

Lambda Layers in third party TensorFlow-based Keras models allow attackers to inject arbitrary code into versions built prior to Keras 2.13 that may then unsafely run with the same permissions as the running application. For example, an attacker could use this feature to trojanize a popular model, save it, and redistribute it, tainting the supply chain of dependent AI/ML applications.

Description

TensorFlow is a widely-used open-source software library for building machine learning and artificial intelligence applications. The Keras framework, implemented in Python, is a high-level interface to TensorFlow that provides a wide variety of features for the design, training, validation and packaging of ML models. Keras provides an API for building neural networks from building blocks called Layers. One such Layer type is a Lambda layer that allows a developer to add arbitrary Python code to a model in the form of a lambda function (an anonymous, unnamed function). Using the Model.save() or save_model() method, a developer can then save a model that includes this code.

The Keras 2 documentation for the Model.load_model() method describes a mechanism for disallowing the loading of a native version 3 Keras model (.keras file) that includes a Lambda layer when setting safe_mode (documentation):

safe_mode: Boolean, whether to disallow unsafe lambda deserialization. When safe_mode=False, loading an object has the potential to trigger arbitrary code execution. This argument is only applicable to the TF-Keras v3 model format. Defaults to True.

This is the behavior of version 2.13 and later of the Keras API: an exception will be raised in a program that attempts to load a model with Lambda layers stored in version 3 of the format. This check, however, does not exist in the prior versions of the API. Nor is the check performed on models that have been stored using earlier versions of the Keras serialization format (i.e., v2 SavedModel, legacy H5).

This means systems incorporating older versions of the Keras code base prior to versions 2.13 may be susceptible to running arbitrary code when loading older versions of Tensorflow-based models.

Similarity to other frameworks with code injection vulnerabilities

The code injection vulnerability in the Keras 2 API is an example of a common security weakness in systems that provide a mechanism for packaging data together with code. For example, the security issues associated with the Pickle mechanism in the standard Python library are well documented, and arise because the Pickle format includes a mechanism for serializing code inline with its data.

Explicit versus implicit security policy

The TensorFlow security documentation at https://github.com/tensorflow/tensorflow/blob/master/SECURITY.md) includes a specific warning about the fact that models are not just data, and makes a statement about the expectations of developers in the TensorFlow development community:

Since models are practically programs that TensorFlow executes, using untrusted models or graphs is equivalent to running untrusted code. (emphasis in earlier version)

The implications of that statement are not necessarily widely understood by all developers of TensorFlow-based systems.The last few years has seen rapid growth in the community of developers building AI/ML-based systems, and publishing pretrained models through community hubs like huggingface (https://huggingface.co/) and kaggle (https://www.kaggle.com). It is not clear that all members of this new community understand the potential risk posed by a third-party model, and may (incorrectly) trust that a model loaded using a trusted library should only execute code that is included in that library. Moreover, a user may also assume that a pretrained model, once loaded, will only execute included code whose purpose is to compute a prediction and not exhibit any side effects outside of those required for those calculations (e.g., that a model will not include code to communicate with a network).

To the degree possible, AI/ML framework developers and model distributors should strive to align the explicit security policy and the corresponding implementation to be consistent with the implicit security policy implied by these assumptions.

Impact

Loading third-party models built using Keras could result in arbitrary untrusted code running at the privilege level of the ML application environment.

Solution

Upgrade to Keras 2.13 or later. When loading models, ensure the safe_mode parameter is not set to False (per https://keras.io/api/models/model_saving_apis/model_saving_and_loading, it is True by default). Note: An upgrade of Keras may require dependencies upgrade, learn more at https://keras.io/getting_started/

If running pre-2.13 applications in a sandbox, ensure no assets of value are in scope of the running application to minimize potential for data exfiltration.

Advice for Model Users

Model users should only use models developed and distributed by trusted sources, and should always verify the behavior of models before deployment. They should follow the same development and deployment best practices to applications that integrate ML models as they would to any application incorporating any third party component. Developers should upgrade to the latest versions of the Keras package practical (v2.13+ or v3.0+), and use version 3 of the Keras serialization format to both load third-party models and save any subsequent modifications.

Advice for Model Aggregators

Model aggregators should distribute models based on the latest, safe model formats when possible, and should incorporate scanning and introspection features to identify models that include unsafe-to-deserialize features and either to prevent them from being uploaded, or flag them so that model users can perform additional due diligence.

Advice for Model Creators

Model creators should upgrade to the latest versions of the Keras package (v2.13+ or v3.0+). They should avoid the use of unsafe-to-deserialize features in order to avoid the inadvertent introduction of security vulnerabilities, and to encourage the adoption of standards that are less susceptible to exploitation by malicious actors. Model creators should save models using the latest version of formats (Keras v3 in the case of the Keras package), and, when possible, give preference to formats that disallow the serialization of models that include arbitrary code (i.e., code that the user has not explicitly imported into the environment). Model developers should re-use third-party base models with care, only building on models from trusted sources.

General Advice for Framework Developers

AI/ML-framework developers should avoid the use of naïve language-native serialization facilities (e.g., the Python pickle package has well-established security weaknesses, and should not be used in sensitive applications).

In cases where it's desirable to include a mechanism for embedding code, restrict the code that can be executed by, for example:

    disallow certain language features (e.g., exec)explicitly allow only a "safe" language subsetprovide a sandboxing mechanism (e.g., to prevent network access) to minimize potential threats.

Acknowledgements

This document was written by Jeffrey Havrilla, Allen Householder, Andrew Kompanek, and Ben Koo.

Vendor Information

One or more vendors are listed for this advisory. Please reference the full report for more information.

Other Information

CVE IDs: CVE-2024-3660
Date Public: 2024-02-23
Date First Published: 2024-04-16
Date Last Updated: 2024-04-18 18:47 UTC
Document Revision: 4

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Keras TensorFlow 代码注入 漏洞 安全 AI ML
相关文章