Running AI Without GPUs Using VMware and Intel's Latest Technologies

Eric Sloof - NTPRO.NL 06月11日 22:50

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

VMware和Intel合作，将AI能力扩展到CPU驱动的基础设施，旨在为企业提供一种无需依赖GPU也能高效运行AI工作负载的解决方案。通过结合Intel 4th Gen Xeon Scalable Processors和VMware的Private AI基础设施，企业可以在边缘、混合云等多种环境中部署和管理AI应用，特别是针对大语言模型（LLMs）的推理任务，从而实现成本效益、灵活性和可扩展性。

💡 VMware Private AI with Intel 基于VMware Cloud Foundation构建，集成了AI基础设施，同时注重隐私、合规和安全。它利用Intel的Advanced Matrix Extensions (AMX)，在CPU上加速AI训练和推理，从而在无需GPU等专用硬件的情况下，实现高效的AI运算。

💻 关键组件包括Intel 4th Gen Xeon Scalable Processors和VMware Cloud Foundation。前者通过AMX技术优化AI性能，尤其是在大语言模型（LLMs）等工作负载上。后者则提供一个统一的管理界面，用于使用VMware Tanzu Kubernetes Grid (TKG)管理容器化的AI工作负载，虚拟化计算、存储和网络资源。

🚀 在实际应用中，VMware和Intel测试了Llama 2-7B模型在Intel硬件上的表现。结果显示，AMX支持的处理器能够高效处理推理工作负载，例如，在INT8精度下，小批量推理延迟低于50毫秒，即使在高token计数下，每个插槽也能扩展多个实例，延迟低于100毫秒。

📈 Intel的AMX技术使推理速度比BF16模型提升高达1.8倍。同时，VMware与Kubernetes的集成使得在现有基础设施上快速、无缝地部署AI模型成为可能。

💰 这种解决方案为希望扩展AI能力但不想在GPU基础设施上大量投资的企业提供了理想选择。它提供了成本效益、灵活性和可扩展性，企业可以利用现有的CPU资源进行AI任务，在云、边缘和本地环境中运行AI工作负载，并使用VMware的编排工具轻松管理资源密集型任务，例如LLM推理。

The demand for AI is expanding beyond traditional GPU-based data centers to diverse environments, including edge and hybrid clouds. VMware and Intel have joined forces to bring AI capabilities to CPU-driven infrastructure, demonstrating how AI workloads can thrive without GPUs by leveraging Intel's 4th Gen Xeon Scalable Processors and VMware's Private AI infrastructure.

What is VMware Private AI with Intel?

VMware Private AI integrates AI infrastructure with privacy, compliance, and security, built on VMware Cloud Foundation (VCF). Paired with Intel’s Advanced Matrix Extensions (AMX), it enables scalable and efficient AI operations without requiring specialized hardware like GPUs.

Key Components

Intel 4th Gen Xeon Scalable Processors with AMX:
AMX accelerates both AI training and inference directly within the CPU, optimizing performance for workloads like large language models (LLMs).

VMware Cloud Foundation:
This software-defined platform virtualizes compute, storage, and networking, providing a unified management interface for containerized AI workloads using VMware Tanzu Kubernetes Grid (TKG).

Use Case: Deploying Llama 2

In a real-world application, VMware and Intel tested the Llama 2-7B model on Intel's hardware, showing how AMX-enabled processors efficiently handle inference workloads. Key results include:

Inference latency under 50ms for small batches with INT8 precision. Scalability for multiple instances per socket with sub-100ms latency, even at high token counts.

Performance Highlights

Intel's AMX technology speeds up inference by up to 1.8x compared to BF16 models. VMware's integration with Kubernetes makes deploying AI models on existing infrastructure fast and seamless.

Benefits for Organizations

This solution is ideal for businesses looking to expand AI capabilities without investing heavily in GPU infrastructure. It offers:

Cost Efficiency:

Flexibility:

Scalability:

By combining VMware and Intel technologies, enterprises can unlock the full potential of AI with optimized infrastructure, reducing costs and simplifying deployment.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签