Quantized Index using FAISS

1 minute read

For anyone doing a nearest-neighbor search in a vector space, scaling can quickly become a problem. Facebook has open-sourced a solution they use for this problem, FAISS.

Let’s see how we can build a large index and quantize it for fast lookup, using a GPU. If you haven’t already, go ahead and install faiss with gpu support: poetry install faiss-gpu.

Import necessary libraries:

import numpy as np
import faiss

First, we will create a large number of vectors for quantization. Mirroring a project I did recently, I’ll use vectors of length 300. Let’s create 5 million of them:

dim = 300
vectors = np.random.rand(5e6, dim)

Now let’s create our index object. FAISS offers several metrics on which to search, I’ll use minimum L2 distance. Our next step will be to create an “Inverted File Product-Quantization” index from our flat (full precision) index. The IVFPQ index will allow us to

quantize each dimension
define Voronoi cells to reduce complexity

nlist = 10000			# Number of Voronoi cells to create
bits = 8 			# Precision of numerical representation
sub_quantizers = 20		# Number of subquantizers to use

index_flat = faiss.IndexFlatL2(dim)
index = faiss.IndexIVFPQ(index_flat, dim, nlist, sub_quantizers, bits)

From here, we could add our vectors and build the index. However, using only a CPU, this would be expensive: the “training” step is where FAISS quantizes each vector and defines the Voronoi cells. It’s an iterative process that, for large indices, is best suited for a GPU.

res = faiss.StandardGpuResources()

# Move the index onto the GPU memory
gpu_index = faiss.index_cpu_to_gpu(res, 0, index)

# Structure index, then add vectors
gpu_index.train(vectors.astype('float32'))
gpu_index.add(vectors.astype('float32'))

Now that we have our index built, let’s move it back to a CPU for lookup – in the event that the machine hosting the index doesn’t have a GPU. We’ll also build a map so that we can reconstruct elements of the index. Let’s also save it.

cpu_index = faiss.index_gpu_to_cpu(gpu_index)
cpu_index.make_direct_map()

faiss.write_index(cpu_index, 'index.ivfpq')

Using this approach, we’ve been able to reduce the on-disk size of an index from 5 GB to around 150 MB – without an appreciable decline in performance!

Gareth Middleton