Zilliz Teams Up with NVIDIA to Launch World's First GPU-Accelerated Vector Database!

Published on March 22, 2024
Tag: Vector Database, GPU Accelerated

As the 'memory' of large language models, the importance of vector databases is self-evident. At GTC 2024, the world's first GPU-accelerated vector database was born, powered by NVIDIA CUDA, achieving a performance boost of up to 50 times. A single line of code in a Shanghai factory five years ago unexpectedly heralded a new era.

On the morning of March 20, San Francisco time, Zilliz and NVIDIA jointly announced the release of Milvus 2.4 at the GTC 2024 conference. This revolutionary vector database system is the first in the industry to leverage the efficient parallel processing capabilities of NVIDIA GPUs and the newly introduced CAGRA (CUDA-Accelerated Graph Index for Vector Retrieval) technology in the RAPIDS cuVS library, providing GPU-based vector indexing and search acceleration capabilities.

Currently, the open-source version of Milvus 2.4 has been released.

What is Milvus?

Milvus is an open-source vector database system designed for large-scale vector similarity search and AI application development. It was initially developed by Zilliz and open-sourced in 2019. In 2020, the project joined the Linux Foundation and successfully graduated. Since its launch, Milvus has gained significant popularity in the AI developer community and has been widely adopted. On GitHub, Milvus has garnered over 26,000 stars and contributions from over 260 contributors, with over 20 million downloads and installations globally, making it one of the most widely used vector databases worldwide. Milvus has been adopted by over 5,000 companies, serving various industries such as AIGC, e-commerce, media, finance, telecommunications, and healthcare.

Why GPU acceleration is needed?

In the era of data-driven applications, fast and accurate retrieval of large amounts of unstructured data is crucial for supporting cutting-edge AI applications. Whether it's generative AI, similarity search, recommendation engines, or virtual drug discovery, vector databases have become the core technology for these advanced applications.

However, the demand for real-time indexing and high throughput continuously challenges traditional CPU-based solutions.

Real-Time Indexing

Vector databases typically require continuous and high-speed ingestion and indexing of new vector data. The ability for real-time indexing is crucial for keeping the database synchronized with the latest data, and avoiding bottlenecks or backlogs.

High Throughput:

Many applications using vector databases, such as recommendation systems, semantic search engines, and anomaly detection, require real-time or near-real-time query processing. High throughput ensures that vector databases can handle a large number of incoming queries simultaneously, providing high-performance services to end-users.

The core operations of vector databases, including similarity calculation and matrix operations, are highly parallel and computationally intensive. GPUs, such as GeForce RTX 2060, RTX 4060, Nvidia A5000, Nvidia V100, Nvidia A100, Nvidia RTX 4090, Nvidia A40, A6000 servers, with their thousands of computing cores and powerful parallel processing capabilities, have become the ideal choice for accelerating these operations.

Performance Evaluation

Index Build Time

In the evaluation of index construction time, we found that for the Cohere-1M-768-dim dataset, the index construction time using CPU (HNSW) was 454 seconds, while using T4 GPU (CAGRA) was only 66 seconds, and A10G GPU (CAGRA) was further reduced to 42 seconds. For the OpenAI-500K-1536-dim dataset, CPU (HNSW) index construction time was 359 seconds, T4 GPU (CAGRA) was 45 seconds, and A10G GPU (CAGRA) was 22 seconds.


In terms of throughput, we compared Milvus integrated with CAGRA GPU acceleration with the standard Milvus implementation using the HNSW index on the CPU. The evaluation metric was queries per second (QPS), which measures the throughput of query execution. In different application scenarios of vector databases, the batch size of queries (the number of queries processed per query) often varies. In the testing process, we used three different batch sizes: 1, 10, and 100, to obtain comprehensive and realistic evaluation results.

From the evaluation results, for a batch size of 1, the Nvidia T4 GPU was 6.4 to 6.7 times faster than the CPU, while the Nvidia A10G GPU was 8.3 to 9 times faster. When the batch size increased to 10, the performance improvement was even more significant: the Nvidia T4 GPU was 16.8 to 18.7 times faster, and the Nvidia A10G GPU was 25.8 to 29.9 times faster. When the batch size was 100, the performance improvement continued to grow: T4 GPU was 21.9 to 23.3 times faster, and A10G GPU was 48.9 to 49.2 times faster.

These results indicate that GPU-accelerated vector database queries can achieve significant performance improvements, especially for larger batch sizes and higher-dimensional data. Milvus integrated with CAGRA unleashes the parallel processing power of the GPU, achieving significant throughput improvements, and making it ideal for vector database workloads in critical scenarios that demand ultimate performance.

The New Era of GPU and Milvus

The integration of the NVIDIA CAGRA GPU acceleration framework into Milvus 2.4 marks a significant breakthrough in the field of vector databases. By harnessing the massive parallel computing power of GPUs, Milvus has achieved unprecedented performance levels in vector indexing and search operations, ushering in a new era of real-time, high-throughput vector data processing.

Explore GPU Rental Services

Nowadays, high-performance GPUs such as T4 and A10G are in short supply and expensive. Therefore, GPU rental services have become the preferred choice for small and medium-sized enterprises or individual operations. Options such as Database Mart's Nvidia RTX 2060, Nvidia RTX 4060, Nvidia A5000, Nvidia V100, Nvidia A100, Nvidia RTX 4090, Nvidia A40, A6000 are the best GPU rental choices for AI and large computing tasks.

Explore additional GPU rental services to kickstart your journey into AI.