Unite.AI 前天 04:35
Enfabrica Unveils Ethernet-Based Memory Fabric That Could Redefine AI Inference at Scale
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Enfabrica推出革命性的弹性内存织物系统(EMFASYS),该系统基于以太网,专为解决大规模AI推理中的内存瓶颈而设计。随着AI模型日益复杂,内存需求激增,EMFASYS通过解耦内存与计算,允许AI数据中心大幅提升性能、降低成本并提高GPU利用率。该系统结合了RDMA over Ethernet和CXL技术,通过其ACF-S芯片将网络和内存控制集成,使得服务器能够访问分布在机架上的海量DDR5 DRAM。此举不仅优化了AI基础设施的建设和扩展方式,还为内存即服务模式和更具弹性的AI云铺平了道路,有望改变AI推理的经济性和可扩展性。

💡 EMFASYS创新性地将内存作为一种共享、分布式资源,通过以太网网络访问,解决了传统数据中心中内存与计算紧密耦合导致的AI推理瓶颈。这使得AI数据中心能够更高效地扩展,摆脱单节点物理内存限制。

🚀 Enfabrica的EMFASYS系统巧妙融合了RDMA over Ethernet和CXL技术,其核心ACF-S芯片(SuperNIC)集成了网络和内存控制功能,能够以微秒级的延迟访问高达18TB的DDR5 DRAM。关键在于,它利用现有的以太网基础设施,无需昂贵的专有互连。

💰 该系统通过将内存密集型工作负载动态地从昂贵的GPU HBM转移到更经济的DRAM,显著降低了AI推理的成本。它通过智能缓存和负载均衡机制隐藏了延迟,并对LLM模型实现透明化内存管理,提高了GPU的利用率。

📈 EMFASYS代表了AI基础设施构建和扩展方式的哲学转变。它通过提供可扩展的内存容量,使数据中心运营商能够避免持续购买昂贵的GPU或HBM,而是通过模块化增加内存,降低总体成本和资源占用。

🌐 这种解耦的架构为内存即服务(memory-as-a-service)模式打开了大门,使得AI的上下文、历史和代理状态可以跨越单个会话或服务器持久存在,从而支持更智能、更个性化的AI系统,并构建更具弹性的AI云。

Enfabrica, a Silicon Valley-based startup backed by Nvidia, has unveiled a breakthrough product that may significantly reshape how large-scale AI workloads are deployed and scaled. The company’s new Elastic Memory Fabric System (EMFASYS) is the first commercially available Ethernet-based memory fabric specifically designed to address the core bottleneck of generative AI inference: memory access.

At a time when AI models are growing more complex, context-aware, and persistent—requiring vast amounts of memory per user session—EMFASYS delivers a novel approach to decoupling memory from compute, allowing AI data centers to dramatically improve performance, lower costs, and increase utilization of their most expensive resources: GPUs.

What is a Memory Fabric—and Why Does It Matter?

Traditionally, memory inside data centers has been tightly bound to the server or node it resides in. Each GPU or CPU has access only to the high-bandwidth memory directly attached to it—usually HBM for GPUs or DRAM for CPUs. This architecture works well when workloads are small and predictable. But generative AI has changed the game. LLMs require access to large context windows, user history, and multi-agent memory—all of which must be processed quickly and without delay. These memory demands often outstrip the available capacity of local memory, creating bottlenecks that strand GPU cores and inflate infrastructure costs.

A memory fabric solves this by transforming memory into a shared, distributed resource—a kind of network-attached memory pool accessible by any GPU or CPU in the cluster. Think of it as creating a “memory cloud” within the data center rack. Instead of replicating memory across servers or overloading expensive HBM, a fabric allows memory to be aggregated, disaggregated, and accessed on demand over a high-speed network. This allows AI inference workloads to scale more efficiently without being shackled by the physical memory limits of any single node.

Enfabrica’s Approach: Ethernet and CXL, Together at Last

EMFASYS achieves this rack-scale memory architecture by combining two powerful technologies: RDMA over Ethernet and Compute Express Link (CXL). The former enables ultra-low-latency, high-throughput data transfer across standard Ethernet networks. The latter allows memory to be detached from CPUs and GPUs and pooled into shared resources, accessible via high-speed CXL links.

At the core of EMFASYS is Enfabrica’s ACF-S chip, a 3.2 terabits-per-second (Tbps) “SuperNIC” that fuses networking and memory control into a single device. This chip allows servers to interface with massive pools of commodity DDR5 DRAM—up to 18 terabytes per node—distributed across the rack. Crucially, it does so using standard Ethernet ports, allowing operators to leverage their existing data center infrastructure without investing in proprietary interconnects.

What makes EMFASYS particularly compelling is its ability to dynamically offload memory-bound workloads from expensive GPU-attached HBM to far more affordable DRAM, all while maintaining microsecond-level access latency. The software stack behind EMFASYS includes intelligent caching and load-balancing mechanisms that hide latency and orchestrate memory movement in ways that are transparent to the LLMs running on the system.

Implications for the AI Industry

This is more than just a clever hardware solution—it represents a philosophical shift in how AI infrastructure is built and scaled. As generative AI moves from novelty to necessity, with billions of user queries being processed daily, the cost of serving these models has become unsustainable for many companies. GPUs are often underutilized not because of lack of compute, but because they sit idle waiting for memory. EMFASYS addresses that imbalance directly.

By enabling pooled, fabric-attached memory accessible via Ethernet, Enfabrica offers data center operators a scalable alternative to continually buying more GPUs or HBM. Instead, they can increase memory capacity modularly, using off-the-shelf DRAM and intelligent networking, reducing the overall footprint and improving the economics of AI inference.

The implications go beyond immediate cost savings. This kind of disaggregated architecture paves the way for memory-as-a-service models, where context, history, and agent state can persist beyond a single session or server, opening the door to more intelligent and personalized AI systems. It also sets the stage for more resilient AI clouds, where workloads can be distributed elastically across a rack or an entire data center without rigid memory limitations.

Looking Ahead

Enfabrica's EMFASYS is currently sampling with select customers, and while the company has not disclosed who those partners are, Reuters reports that major AI cloud providers are already piloting the system. This positions Enfabrica not just as a component supplier, but as a key enabler in the next generation of AI infrastructure.

By decoupling memory from compute and making it available across high-speed, commodity Ethernet networks, Enfabrica is laying the groundwork for a new era of AI architecture—one where inference can scale without compromise, where resources are no longer stranded, and where the economics of deploying large language models finally begin to make sense.

In a world increasingly defined by context-rich, multi-agent AI systems, memory is no longer a supporting actor—it is the stage. And Enfabrica is betting that whoever builds the best stage will define the performance of AI for years to come.

The post Enfabrica Unveils Ethernet-Based Memory Fabric That Could Redefine AI Inference at Scale appeared first on Unite.AI.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Enfabrica EMFASYS AI推理 内存架构 以太网
相关文章