MarkTechPost@AI 03月31日
PilotANN: A Hybrid CPU-GPU System For Graph-based ANNS
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

PilotANN是一种创新的混合CPU-GPU系统,旨在解决近似最近邻搜索(ANNS)在处理高维度向量和大规模数据集时的性能瓶颈。该系统通过巧妙地利用CPU的内存优势和GPU的并行计算能力,实现了高效的向量搜索。PilotANN采用多阶段数据处理流程,减少数据移动,并在单GPU环境下实现了显著的性能提升和成本效益,为资源有限的研究机构提供了更经济高效的ANNS解决方案。

🚀 **背景介绍:** 近似最近邻搜索(ANNS)是高效查找高维向量空间中相似项的关键技术,但传统方法在处理高维度向量和大型数据集时面临性能瓶颈,尤其是在CPU实现中。

💡 **PilotANN的设计理念:** PilotANN是一个混合CPU-GPU系统,它通过将搜索过程分解为多阶段CPU-GPU流水线来解决性能问题。该系统充分利用了CPU的内存容量和GPU的并行处理能力。

⚙️ **PilotANN的工作流程:** PilotANN采用三阶段图遍历过程:首先,使用降维向量在GPU上进行子图遍历;然后,在CPU上使用完整向量进行残差精炼;最后,使用完整图和完整向量进行精确搜索。

📈 **性能优势:** 实验结果表明,PilotANN在多个大规模数据集上实现了显著的性能提升,例如,在96维DEEP数据集上,PilotANN的吞吐量是HNSW-CPU基线的3.9倍。即使在具有挑战性的T2I数据集上,PilotANN也展现出优异的性能。

💰 **成本效益:** 尽管PilotANN使用了相对昂贵的GPU硬件,但其成本效益依然显著。在DEEP、T2I、WIKI和LAION数据集上,PilotANN的成本效益分别达到了CPU-only解决方案的2.3到3.2倍。

Approximate Nearest Neighbor Search (ANNS) is a fundamental vector search technique that efficiently identifies similar items in high-dimensional vector spaces. Traditionally, ANNS has served as the backbone for retrieval engines and recommendation systems, however, it struggles to keep pace with modern Transformer architectures that employ higher-dimensional embeddings and larger datasets. Unlike deep learning systems that can be horizontally scaled due to their stateless nature, ANNS remains centralized, creating a severe single-machine throughput bottleneck. Empirical testing with 100-million scale datasets reveals that even state-of-the-art CPU implementations of the Hierarchical Navigable Small World (HNSW) algorithm can’t maintain adequate performance as vector dimensions increase.

Previous research on large-scale ANNS has explored two optimization paths: index structure improvements and hardware acceleration. The Inverted MultiIndex (IMI) enhanced space partitioning through multi-codebook quantization, while PQFastScan improved performance with SIMD and cache-aware optimizations. DiskANN and SPANN introduced disk-based indexing for billion-scale datasets, addressing memory hierarchy challenges through different approaches. SONG and CAGRA achieved impressive speedups through GPU parallelization but remain constrained by GPU memory capacity. BANG handled billion-scale datasets via hybrid CPU-GPU processing but lacked critical CPU baseline comparisons. These methods frequently sacrifice compatibility, accuracy or require specialized hardware.

Researchers from the Chinese University of Hong Kong, Centre for Perceptual and Interactive Intelligence, and Theory Lab of Huawei Technologies have proposed PilotANN, a hybrid CPU-GPU system designed to overcome the limitations of existing ANNS implementations. PilotANN addresses the challenge: CPU-only implementations struggle with computational demands, while GPU-only solutions are constrained by limited memory capacity. It solves this issue by utilizing both the abundant RAM of CPUs and the parallel processing capabilities of GPUs. Moreover, it employs a three-stage graph traversal process, GPU-accelerated subgraph traversal using dimensionally-reduced vectors, CPU refinement, and precise search with complete vectors.

PilotANN fundamentally reimagines the vector search process through a “staged data ready processing” paradigm. It minimizes data movement across processing stages rather than adhering to traditional “move data for computation” models. It also consists of three stages: GPU piloting with subgraph and dimensionally-reduced vectors, residual refinement using subgraph with full vectors, and final traversal employing full graph and complete vectors. The design shows cost-effectiveness with only a single commodity GPU while scaling effectively across vector dimensions and graph complexity. Data transfer overhead is minimized to just the initial query vector movement to GPU and a small candidate set returning to CPU after GPU piloting.

Experimental results show PilotANN’s performance advantages across diverse large-scale datasets. PilotANN achieves a 3.9 times throughput speedup on the 96-dimensional DEEP dataset compared to the HNSW-CPU baseline, with even more impressive gains of 5.1-5.4 times on higher-dimensional datasets. PilotANN delivers significant speedups even on the notoriously challenging T2I dataset despite no specific optimizations for this benchmark. Moreover, it shows remarkable cost-effectiveness despite utilizing more expensive hardware. While the GPU-based platform costs 2.81 USD/hour compared to the CPU-only solution at 1.69 USD/hour, PilotANN achieves 2.3 times cost-effectiveness for DEEP and 3.0-3.2 times for T2I, WIKI, and LAION datasets when measuring throughput per dollar.

In conclusion, researchers introduced PilotANN, an advancement in graph-based ANNS that effectively utilizes CPU and GPU resources for emerging workloads. It shows great performance over existing CPU-only approaches through the intelligent decomposition of top-k search into a multi-stage CPU-GPU pipeline and implementation of efficient entry selection. It democratizes high-performance nearest neighbor search by achieving competitive results with a single commodity GPU, making advanced search capabilities accessible to researchers and organizations with limited computing resources. Unlike alternative solutions requiring expensive high-end GPUs, PilotANN enables efficient ANNS deployment on common hardware configurations while maintaining search accuracy.


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

The post PilotANN: A Hybrid CPU-GPU System For Graph-based ANNS appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

PilotANN ANNS CPU-GPU 向量搜索
相关文章