Azure Machine Learning - Vector search in Azure AI Search

Vector search is an information retrieval method that uses numerical representations of content to perform search scenarios. Since the content is numbers rather than plain text, search engines match vectors that are most similar to the query without matching exact words. This article provides an overview of vector support in Azure AI Search. It also explains integration with other Azure services, as well as terms and concepts related to vector search development

Follow TechLead and share all-dimensional knowledge of AI. The author has 10+ years of Internet service architecture, AI product development experience, and team management experience. He holds a master's degree from Tongji University in Fudan University, a member of Fudan Robot Intelligence Laboratory, a senior architect certified by Alibaba Cloud, a project management professional, and research and development of AI products with revenue of hundreds of millions. principal.

file

What is vector search in Azure AI Search?

Vector Search is a new feature for indexing vector embeddings from the search index, as well as storing and retrieving vector embeddings. You can use it to power similarity searches, multi-modal searches, recommendation engines, or applications that implementRetrieval Augmented Generation (RAG) architecture.

The following diagram shows the indexing and query workflow for vector search.

file

On the indexing side, source documents can be prepared to include embeddings. Although integrated vectorization is currently available as a public preview, the production release of Azure AI Search does not generate embeds. If you need to comply with the non-preview feature policy, your solution should include calls to Azure OpenAI or other models that can convert images, audio, text, and other content into vector representations. Add a vector field to the index definition on Azure AI Search. Load the index using a document payload containing vectors. The index is now available for querying.

On the query side, query input can be collected in the client application. Add a step to convert the input into a vector and then send the vector query to the index on Azure AI Search for similarity search. Azure AI Search returns documents in results that contain the requested k nearest neighbors (kNN).

Vector data can be indexed as fields in documents along with alphanumeric content. Vector queries can be issued alone or in combination with filters and other query types, including word queries (hybrid searches) and semantic ranking within the same search request.

Availability and pricing

Vector Search is available as part of all Azure AI Search tiers in all regions at no additional cost.

Remark

Some older search services created before January 1, 2019 are deployed on infrastructure that does not support vector workloads. If you receive an error when trying to add a vector field to the schema, the cause is that the service is out of date. In this case, you must create a new search service to try out the vector functionality.

What scenarios does vector search support?

Suitable scenarios for vector search include:

  • Conduct vector search on text. Encode the text using an embedding model (such as OpenAI Embedding) or an open source model (such as SBERT) and retrieve the document using a query also encoded as a vector.

  • Vector search across different data types (multi-mode). Encode images, text, audio, and video, or even hybrids of them (e.g., using models like CLIP), and perform similarity searches on them.

  • Multi-language search. Represent documents in multiple languages ​​in a single vector space using a multilingual embedding model to find documents regardless of the language they are in.

  • Hybrid Search. Vector searches are implemented at the field level, which means that you can generate queries that contain both vector fields and searchable text fields. Queries will be executed in parallel and the results will be combined into a single response. (Optional) Add [Semantic Ranking] to perform L2 re-ranking using the same language model that powers Bing for better accuracy.

  • Filtered vector search. Query requests can contain vector queries and [filter expressions]. Filters are available for text and numeric fields, can be used for metadata filtering, and are useful when including or excluding search documents based on filter criteria. Although vector fields themselves are not filterable, you can make text or numeric fields filterable. Search engines can process filters before or after executing vector queries.

  • Vector database. Use Azure AI Search as a vector store to serve as a long-term memory or external knowledge base for large language models (LLMs) or other applications. For example, for Retrieval Augmented Generation (RAG) applications, you can use Azure AI Search as a vector index in [Azure Machine Learning prompt flow].

You can use other Azure services to provide embedding and data storage.

  • Azure OpenAI provides embedded models. Demonstrations and examples are for [text-embedding-ada-002] and other models. We recommend using Azure OpenAI to generate text embeddings.

  • [Image Retrieval Vectorized Image API (Preview)] supports vectorization of image content. We recommend using this API to generate image embeds.

  • Azure AI Search can automatically index vector data from two data sources: [Azure Blob Indexer] and [Azure Cosmos DB for NoSQL Indexer].

  • LangChain is a framework for developing applications powered by language models. Using the Azure AI Search vector store integration simplifies the creation of applications that use LLM that use Azure AI Search as a vector data store.

  • Semantic Kernel is a lightweight SDK for integrating AI large language models (LLMs) with traditional programming languages. It's great for chunking large documents in larger workflows that send input to an embedding model.

Vector search concept

If you're new to vectors, this section explains some core concepts.

About vector search

Vector search is an information retrieval method in which documents and queries are represented as vectors rather than plain text. In vector search, a machine learning model generates a vector representation of a source input (which can be text, image, audio, or video content). Using a mathematical representation of the content provides a common basis for search scenarios. If all content is vector, the query can find matches in vector space, even if the associated original content is in a different medium or in a different language than the query.

Why use vector search

Vector overcomes the limitations of traditional keyword-based search by using machine learning models to capture the meaning of words and phrases in context, rather than relying solely on lexical analysis and matching of individual query terms. By capturing the intent of the query, vector search can return more relevant results that match the user's needs, even if the exact words are not present in the document.

Additionally, vector searches can be applied to different types of content, such as images and videos, not just text. This enables new search experiences, such as multimodal search in multilingual applications or cross-language search.

Embedding and vectorization

An embedding is a specific vector representation of content or a query that is created by a machine learning model that captures the semantics of text or a representation of other content, such as an image. Natural language machine learning models have been trained on large amounts of data to identify patterns and relationships between words. During training, they try to represent any input as a vector of real numbers in an intermediate step called the encoder. After training is complete, these language models can be modified so that the intermediate vector representation becomes the output of the model. The resulting embeddings are high-dimensional vectors where words with similar meanings are closer together in vector space, as described in Understanding embeddings (Azure OpenAI).

The effectiveness of vector search in retrieving relevant information depends on the effectiveness of the embedding model in extracting the meaning of documents and queries into the resulting vector. The best models have been thoroughly trained based on the type of data they represent. You can evaluate existing models (such as Azure OpenAI text-embedding-ada-002), introduce your own model that has been trained directly in the problem space, or fine-tune a general model. Azure AI Search doesn't impose restrictions on the model you choose, so choose the model that best fits your data.

To create efficient embeddings for vector searches, input size limitations must be taken into account. We recommend following data chunking guidelines before generating embeddings. This best practice ensures that embeddings accurately capture relevant information and enables more efficient vector searches.

What is an embedded space?

An _embedding space_ is a corpus of vector queries. In the search index, it is all vector fields populated with embeddings from the same embedding model. Machine learning models create embedding spaces by mapping individual words, phrases, or documents (for natural language processing), images, or other forms of data to a representation consisting of vectors of real numbers (representing coordinates in a high-dimensional space). In this embedding space, similar items are located close to each other, while dissimilar items are located farther apart.

For example, documents talking about different kinds of dogs will be clustered closely together in the embedding space. Documents about cats will also be close to each other but farther from the dog cluster, although they are still in the animal neighborhood. Different concepts such as cloud computing are very different from this. In practice, these embedded spaces are abstract and have no clearly defined, human-interpretable meaning, but the core idea remains the same.

nearest neighbor search

In vector search, the search engine searches the embedding space for vectors to identify those that are close to the query vector. This technique is called nearest neighbor search. Nearest neighbors help quantify the similarity between items. High vector similarity indicates that the original data are also similar. To help achieve fast nearest neighbor searches, search engines will perform optimizations or employ data structures or data partitioning to reduce the search space. Each vector search algorithm provides a different approach to solving this problem and trades off different characteristics such as latency, throughput, recall, and memory. To calculate similarity, the similarity metric provides a mechanism for calculating this distance.

Azure AI Search currently supports the following algorithms:

  • Hierarchical Navigable Small Worlds (HNSW): HNSW is a leading ANN algorithm that has been optimized for high-recall, low-latency applications where the data distribution is unknown or may change frequently. It organizes high-dimensional data points into hierarchical graph structures, enabling fast and scalable similarity searches while allowing an optimizable trade-off between search accuracy and computational cost. Because the algorithm requires all data points to reside in memory to enable fast random access, it consumes the [vector index size] quota.

  • Exhaustive K-Nearest Neighbors (KNN): Computes the distance between the query vector and all data points. This is a computationally intensive algorithm, so it is best suited for smaller data sets. Because the algorithm does not require fast random access to data points, it does not consume the vector index size quota. However, this algorithm will provide a global set of nearest neighbors.

In the index definition, you can specify one or more algorithms, and then specify the algorithm to use for each vector field:

  • [Create vector index] to specify the algorithm in the index and fields.

  • For exhaustive KNN, use the [2023-11-01], [2023-10-01-Preview], or Azure SDK Beta libraries for any REST API version.

The algorithm parameters used to initialize the index during index creation are immutable and cannot be changed after the index is generated. However, parameters that affect query-time characteristics (efSearch) can be modified.

Additionally, fields specifying the HNSW algorithm support performing an exhaustive KNN search using the [query request] parameter "exhaustive": true . But the reverse does not apply. If a field is indexed against exhaustiveKnn, you cannot use HNSW in the query because no other data structure exists to enable efficient searching.

approximate nearest neighbor

Approximate nearest neighbor search (ANN) is an algorithm for finding matches in vector space. Such algorithms employ different data structures or data partitioning methods to significantly reduce the search space to speed up query processing.

ANN algorithms sacrifice some accuracy but provide scalable and faster approximate nearest neighbor retrieval, making them ideal for balancing accuracy and efficiency in modern information retrieval applications. You can adjust the parameters of the algorithm to fine-tune the recall, latency, memory, and disk footprint requirements of your search application.

Azure AI Search uses HNSW for its ANN algorithm.

Follow TechLead and share all-dimensional knowledge of AI. The author has 10+ years of Internet service architecture, AI product development experience, and team management experience. He holds a master's degree from Tongji University in Fudan University, a member of Fudan Robot Intelligence Laboratory, a senior architect certified by Alibaba Cloud, a project management professional, and research and development of AI products with revenue of hundreds of millions. principal.

Guess you like

Origin blog.csdn.net/magicyangjay111/article/details/134477584