What is vector search?

Published: 6 June 2024
Contributors: Meredith Syed, Erika Russi

Vector search is a search technique used to find similar items or data points, typically represented as vectors, in large collections. Vectors, or embeddings, are numerical representations of words, entities, documents, images or videos. Vectors capture the semantic relationships between elements, enabling effective processing by machine learning models and artificial intelligence applications.

Vector search vs. traditional search

In contrast to traditional search, which typically uses keyword search, vector search relies on vector similarity search techniques like k-nearest neighbor search (knn) to retrieve data points similar to a query vector based on some distance metric. Vectors capture semantic relationships and similarities between data points, enabling semantic search instead of simple keyword search.

To illustrate the difference between traditional keyword and vector search, let’s go through an example. Say you are looking for information on the best pizza restaurant and you search for “best pizza restaurant” in a traditional keyword search engine. The keyword search looks for pages that contain the exact words “best”, “pizza” and “restaurant” and only returns results like “Best Pizza Restaurant” or “Pizza restaurant near me”. Traditional keyword search focuses on matching the keywords rather than understanding the context or intent behind the search.

By contrast, in a semantic vector search, the search engine understands the intent behind the query. Semantic, by definition, means relating to meaning in language, that is, semantic search understands the meaning and context of a query. In this case, it would look for content that talks about top-rated or highly recommended pizza places, even if the exact words "best pizza restaurant" are not used in the content. The results are more contextually relevant and might include articles or guides that discuss high quality pizza places in various locations.

Traditional search methods typically represent data using discrete tokens or features, such as keywords, tags or metadata. As shown in our example above, these methods rely on exact matches to retrieve relevant results. By contrast, vector search represents data as dense vectors (a vector in which most or all of the elements are non-zero) in a continuous vector space, the mathematical space in which data is represented as vectors. Each dimension of the dense vector corresponds to a latent feature or aspect of the data, an underlying characteristic or attribute that is not directly observed but is inferred from the data through mathematical models or algorithms. These latent features capture the hidden patterns and relationships in the data, enabling more meaningful and accurate representations of items as vectors in a high-dimensional space.

Traditional search methods may struggle with scalability for large datasets or high-dimensional data due to computational and memory constraints. By contrast, vector embeddings are easier to scale to larger datasets and more complex models. Unlike sparse representations of data where most of the values are zeros across dimensions, embeddings are dense vector representations having non-zero values in most dimensions. This allows vector embeddings to store more information in a smaller, lower-dimensional space, requiring less memory.¹As a result, machine learning algorithms and models can use embeddings more efficiently with fewer compute resources.

How to choose the right AI foundation model

Use this model selection framework to choose the most appropriate model while balancing your performance requirements with cost, risks and deployment needs.

Related content

Vectorization process

For this explainer, we will focus on the vector representations applicable under natural language processing (NLP), that is, vectors that represent words, entities or documents.

We will illustrate the vectorization process by vectorizing a small corpus of sentences: “the cat sat on the mat”, “the dog played in the yard” and “birds chirped in the trees”.

The first step to building vector embeddings is to clean and process the raw dataset. This may involve the removal of noise and standardization of the text. For our example, we won’t do any cleaning since the text is already cleaned and standardized.

Next, an embedding model is chosen to be trained on the dataset. The trained embedding model is used to generate embeddings for each data point in the dataset. For text data, popular open-source embedding models include Word2Vec, GloVe, FastText or pre-trained transformer-based models like BERT or RoBERTa².

For our example, we’ll use Word2Vec to generate our embeddings.

Next, the embeddings are stored in a vector database or a vector search plugin for a search engine, like Elasticsearch, is used. In vector search, relevance of a search result is established by assessing the similarity between the query vector, which is generated by vectorizing the query, and the document vector, which is a representation of the data being queried. Indexes need to be created in the vector database to enable fast and efficient retrieval of embeddings based on similar queries. Techniques such as hierarchical navigable small world (HNSW) can be used to index the embeddings and facilitate similarity search at query time. HNSW organizes the dataset and enables rapid search for nearest neighbors by clustering similar vectors together during the index construction process.

Finally, a mechanism or procedure to generate vectors for new queries must be established. This typically involves creating an API or service that takes user search queries as input in real-time, processes it using the same vector model and generates a corresponding vector representation. This vector can then be used to search on the database to get the most relevant results.

Finding similarity with distance measurements and ANN algorithms

In vector search, relevance is determined by measuring the similarity between query and document vectors. To compare two vectors against each other and determine their similarity, some distance measurement may be used, such as Euclidean distance or cosine similarity³.

Euclidean distance

Euclidean distance is a measure of the straight-line distance between two points. It is calculated as the square root of the sum of the squared differences between the corresponding coordinates of the two points.

This formula can be extended to higher-dimensional spaces by adding more terms to account for additional dimensions.

Cosine similarity

Cosine similarity is a measure of similarity between two vectors in a multi-dimensional space. It calculates the cosine of the angle between the two vectors, indicating how closely the vectors align with each other.

Mathematically, the cosine similarity, cos(θ), between two vectors is calculated as the dot product of the two vectors divided by the product of their magnitudes.

Cosine similarity ranges from -1 to 1, where:

1 indicates that the vectors are perfectly aligned (pointing in the same direction),
0 indicates that the vectors are orthogonal (perpendicular to each other) and
-1 indicates that the vectors are pointing in opposite directions.

Cosine similarity is particularly useful when dealing with vectors, as it focuses on the directional relationship between vectors rather than their magnitudes.

Approximate-nearest neighbor (ANN)

Although the distance metrics mentioned previously can be used to measure vector similarity, it becomes inefficient and slow to compare all possible vectors against the query vector at query time for similarity search. To solve for this, we can use an approximate-nearest neighbor (ANN) search.

Instead of finding an exact match, ANN algorithms efficiently search for the vectors that are approximately closest to a given query based on some distance metric like Euclidean distance or cosine similarity. By allowing for some level of approximation, these algorithms can significantly reduce the computational cost of nearest neighbor search without the need to compute embedding similarities across an entire corpus.

One of the most popular ANN algorithms is HNSW graphs. The hierarchical navigable small world graph structure indexes the dataset and facilitates fast search for nearest neighbors by grouping similar vectors together as it builds the index. HNSW organizes data into neighborhoods, linking them with probable connections. When indexing a dense vector, it identifies the suitable neighborhood and its potential connections, storing them in a graph structure. During an HNSW search with a dense vector query, it locates the optimal neighborhood entry point and returns the nearest neighbors.

Applications of vector search

Vector search has numerous use cases across domains due to its ability to efficiently retrieve similar items based on their vector representations. Some common applications of vector search include:

Information retrieval

Vector search is used in search engines to retrieve documents, articles, web pages or other textual content based on their similarity to a query. It enables users to find relevant information even if the exact terms used in the query are not present in the documents.

Retrieval Augmented Generation (RAG)

Vector search is instrumental in the Retrieval Augmented Generation (RAG) framework for retrieving relevant context from a large corpus of text. RAG is a framework for generative AI that combines vector search with generative language models to generate responses.

In traditional language generation tasks, large language models (LLMs) like OpenAI’s GPT (Generative Pre-trained Transformer) or IBM’s Granite Models are used to construct responses based on the input prompt. However, these models may struggle to produce responses that are contextually relevant, factually accurate or up to date. RAG addresses this limitation by incorporating a retrieval step before response generation. During retrieval, vector search can be used to identify contextually pertinent information, such as relevant passages or documents from a large corpus of text, typically stored in a vector database. Next, an LLM is used to generate a response based on the retrieved context.

Beyond language generation, RAG and vector search have further applications in various other NLP tasks, including question answering, chatbots, summarization and content generation.

Hybrid search

Vector search can be integrated into hybrid search approaches to enhance the effectiveness and flexibility of the search process. Hybrid search combines vector search with other search techniques, such as keyword-based search or metadata-based search. Vector search may be used to retrieve items based on their similarity to a query, while other search methods may be used to retrieve items based on exact matches or specific criteria.

Video and image search

Vector stores are used in image and video search engines to index and retrieve visual content based on similarity. Image and video embeddings are stored as vectors, enabling users to search for visually similar images or videos across large datasets.

Recommendation systems

Recommendation engines in streaming services as well as e-commerce, social media and visual media platforms can be powered by vector search. Vector search allows for the recommendation of products, movies, music or other items based on their similarity to items that users have interacted with or liked previously.

Geospatial analysis

Vector search is used in geospatial data applications to to retrieve spatial data such as points of interest, geographic features or spatial trajectories based on their proximity or similarity to a query location or pattern. It enables efficient spatial search and analysis in geographic information systems and location-based services.

Related resources

How data stores and governance impact your AI initiatives

Organizations with fit-for-purpose data stores and AI governance are better prepared to maximize AI efforts and business benefits.

Enterprise-ready, IBM-developed watsonx Granite models now available

IBM announces the general availability of the first models in the watsonx Granite model series – a collection of generative AI models to advance the infusion of generative AI into business applications and workflows.

What is retrieval-augmented generation?

RAG is an AI framework for retrieving facts from an external knowledge base to ground LLMs on the most accurate, up-to-date information and to give users insight into LLMs' generative process.

Take the next step

Train, validate, tune and deploy generative AI, foundation models and machine learning capabilities with IBM watsonx.ai, a next-generation enterprise studio for AI builders. Build AI applications in a fraction of the time with a fraction of the data.

Explore watsonx.ai

Book a live demo

Footnotes

¹Bahaaldine Azarmi and Jeff Vestal, Vector Search for Practitioners with Elastic, Packt Publishing, 2023

² Vicki Boykis, “What are embeddings,” 2023, https://vickiboykis.com/what_are_embeddings/ (link resides outside ibm.com)

³ Trey Grainger, Doug Turnbull and Max Irwin, AI Powered Search, Manning Publications, 2024