C3.AI 06月25日 15:44
Four Key Features to Successfully Deploying Proprietary and Open-Source Vision Language Models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了在企业AI管道中部署视觉语言模型(VLMs)的挑战与解决方案。VLMs结合了大型语言模型(LLMs)和计算机视觉,在图像问答等领域具有广泛应用。C3 AI平台提供企业级VLMs的无缝部署和扩展,支持多种容错VLM部署,简化了依赖管理、硬件加速和图像数据处理等任务。文章重点介绍了C3 AI平台在安全性、高度优化推理、版本控制和可扩展性方面的优势,为企业AI应用解锁了新的可能性。

🖼️ **VLM的定义与部署:** 部署VLMs的第一步是定义VLM,确保运行模型所需的库和硬件支持,并确定模型输入输出的配置。C3 AI平台支持多种图像格式,方便用户灵活处理。

🛡️ **安全性与私有化:** C3 AI平台支持在用户自有环境中部署VLMs,保障专有模型和数据的安全。例如,用户可以将微调后的Llava-1.5-13b模型部署到自己的节点池中,确保模型权重和数据不会离开环境。

🚀 **高度优化的推理与加速:** 平台允许用户利用加速器库在多个GPU上高效运行模型。通过灵活使用加速库,可以显著提升VLM的推理速度。

🔄 **版本控制与模型管理:** C3 AI模型注册表对VLM部署进行版本控制,确保模型的一致性和可控性。用户只需一行代码即可将VLM注册到模型注册表中,方便维护版本信息和模型描述。

📈 **可扩展性与资源配置:** VLMs可以在单个GPU上运行,并可扩展至数百个GPU。用户可以配置节点池以部署模型,例如将Llava模型部署到多个Nvidia H200节点上,从而实现高效的并行处理。

Part two of the series: Using LLMs in Enterprise AI Pipelines

By Adam Gurary, Senior Associate Product Manager, C3 AI


Vision language models (VLMs) are a major advancement in AI, combining the capabilities of large language models (LLMs) with computer vision to perform tasks such as visual question answering (VQA). VQA models, such as Blip-2 and Llava, answer questions about images, a capability with vast applications, including automated diagram annotation, surveillance, and robotics.

 

VLMs are powerful new tools for innovation but using them in production presents unique challenges. Many of the standard open-source LLMs, such as Llama-3, Mixtral, and Falcon, use a similar decoder-only architecture and only deal with strings. VLMs, on the other hand, are not as standard. This creates more overhead for data science and model operations teams in tasks such as dependency management, hardware acceleration, and image data handling.

The C3 AI Platform enables the seamless, enterprise-grade deployment and scaling of VLMs. Our platform supports multiple fault tolerant VLM deployments, allowing you to deploy, manage, monitor, and scale your VLMs easily and securely. From “one model to rule them all” to dozens of models fine-tuned for their specific-use cases, the C3 AI Platform provides the tools and flexibility needed to productionize VLMs in enterprise AI pipelines.

 

The C3 AI VLM Deployment Process

In part one of this series, we discussed the standard C3 AI LLM deployment process. The process for deploying VLMs is almost the same, with one additional step: Define the VLM.

Defining the VLM ensures that all libraries needed to run the model on the required hardware, such as Hugging Face Accelerate in the case of a PyTorch model, are available. This step also determines the profile of model inputs and outputs. The C3 AI Platform offers the flexibility to accept the image in a variety of formats, from passing the image directly to passing only a reference to the image.

Upon deployment, any authorized application can now make requests to the VLM.

The four key features discussed in the previous installment of this series hold for VLMs:

    Security: Like LLM deployment, keeping your proprietary VLM and data secure by deploying within your own environment is supported.
    c3.ModelInference.deploy(
    llava15_13b_finetuned,
    nodepools = ["llava_nodepool"])

    The user deploys a fine-tuned Llava-1.5-13b model to one of their own node pools, ensuring no proprietary model weights or data leaves the environment.

    Highly Optimized Inference: Run models fast with the flexibility to use accelerator libraries to manage the VLM execution across multiple GPUs. Versioning Enforcement: The C3 AI Model Registry enforces versioning for your VLM deployments, ensuring consistency and control over your models.
    c3.ModelRegistry.registerMlPipe(
    llava15_13b_finetuned,
    "my_llava_models",
    "finetuned_version")
    });

    The user registers the VLM to the C3 AI Model Registry in one line of code, specifying a URI to maintain versioning and a description of the LLM.

    Scalability: VLMs can also be served on a single GPU and scale independently to up to hundreds of GPUs.
    c3.app().configureNodePool( 
    "llava_nodepool",
    targetNodeCount = 8,
    hardwareProfile = "Nvidia_8xH200")

    The user configures a node pool to deploy a Llava model across eight Nvidia H200 nodes, effectively deploying the model to 64 Nvidia H200 GPUs.

 

Unlocking New Possibilities with VLMs

Deploying VLMs on the C3 AI Platform opens a world of possibilities for enterprise AI applications. Whether it’s assisting doctors in diagnosing medical images, or enhancing surveillance systems, VLMs are set to transform industries with their dual understanding of text and images.

In the next installment of our series, we discuss the basics of managing LLM, VLM, and other large model deployments.

 


About the Author

Adam Gurary is a Senior Associate Product Manager at C3 AI, where he manages the roadmap and execution for the platform’s model inference service and machine learning pipelines. Adam and his team focus on building state-of-the-art tooling for hosting and serving open-source large language models and for creating, training, and executing machine learning pipelines. Adam holds a B.S. in Mathematical and Computational Science from Stanford University.

 

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

视觉语言模型 VLM C3 AI 企业AI
相关文章