Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

OutSystems, OpenAI Embeddings and Qdrant Vector Database — Find Similar

Sign upSign InSign upSign InStefan WeberFollowITNEXT--ListenShareIn this first article of a three-article series, we’ll explore how you can use OpenAI Embeddings and the Qdrant Vector Database to make text searches for similar meanings in your OutSystems application.This article, as well as the upcoming ones, includes a sample QnA application that I have published on Forge. I will explain the implementation details using this sample. See links below.Part 1 — “Find Similar” (this article)In this article you will learn the basics of vector embeddings and vector databases. Plus, it'll show you how to use pre-built Forge components to create these vector embeddings, save them in Qdrant Vector Database, and execute similarity queries.Part 2 — “Answer Right” (August 2023)In the second part, we’re expanding our application. We’re combining our similarity search with OpenAI’s completions. We’re using the search results from our QnA application as context for an OpenAI prompt. This helps OpenAI answer users’ questions based only on the answers we’ve collected. This is a common way to use generative AI along with a reliable information source to avoid incorrect answers.Part 3 — “Plugged In” (September 2023)The last part is an optional extension to our application where we expose our Question-and-Answer knowledge base as a ChatGPG Plugin. You will learn how easy it is to create a ChatGPT Plugin with OutSystems. You will need a ChatGPT subscription to follow.Embeddings are numerical representations (vectors) used to encode several types of information, such as text, images, audio, and video files. These representations are generated through a trained language model, allowing them to capture the true meaning of the input data.By the time of writing the OpenAI embeddings model text-embedding-ada-002 creates vectors with 1536 dimensions, meaning each embedding is an array of 1536 floating-point numbers. The key advantage of embeddings is that items that are close together in vector space are also semantically similar. This means that if two vectors are numerically similar, the corresponding data they represent share similar meanings or context.A new type of database called vector databases has emerged to address the efficient storage and retrieval of vectors, particularly for performing similarity queries. Unlike traditional databases, which may not be optimized for handling high-dimensional numerical data like vectors, vector databases are designed to excel in these tasks.Qdrant is both a vector database and a vector similarity search engine, created and maintained by Qdrant Solutions. The project is openly available on Github under the permissive Apache 2.0 license. In addition, Qdrant Solutions provides a fully managed Qdrant Database Cluster cloud service.cloud.qdrant.iogithub.comVector databases like Qdrant play an essential role in combination with large language models to enhance the efficiency and effectiveness of natural language processing tasks. The combination of vector databases and language models empowers applications like information retrieval, recommendation systems, sentiment analysis and semantic search.In sample application, I am directly using the OpenAI API endpoints for vector embeddings. You might have noticed that OutSystems recently released an Azure OpenAI component on the Forge Marketplace, which also provides support for creating vector embeddings. However, since not everyone has access to an Azure tenant and the process for obtaining Azure OpenAI access might still require an application at the time of writing, I opted to use the non-Azure endpoints instead.Anyhow, if you have access to Azure OpenAI services you can exchange the embeddings part in the demo application with the official and supported component.www.outsystems.comBesides OpenAI you can use any other service for creating vector embeddings.Before you can use the demo application you need to perform the following tasks.Go to OutSystems Forge and download the sample application.www.outsystems.comThe sample application has dependencies to the following other Forge components.OpenAI Embeddings is a small connector that implements only the Embeddings endpoint of OpenAI. It is used to create vector embeddings for questions and search terms in the sample application.www.outsystems.comQdrant Vector Database connects with self-hosted or Qdrant Solutions cloud service instances of a Qdrant database cluster. It provides server actions to list and create collections of vector embeddings, save vector embeddings (Points) and querying.www.outsystems.comVisit the OpenAI website and sign up for an account. OpenAI is a commercial offering, and you will be charged per usage. With registration you will get some free credits which are more than sufficient for experimenting.After signing up go to View API Keys in your profile menu and create a new API Key. Make sure to copy the key when it gets displayed on the screen. The key is needed to authorize your requests to the OpenAI API.Qdrant Solutions is offering a free tier of their Qdrant vector database cloud service. At https://cloud.qdrant.io you can sign up with your GitHub or Google Account.After signing up follow the wizard to create your free tier cluster. Copy the Cluster URL and your API key.Open OutSystems Service Center of the environment where you installed the Qdrant Vector Database Forge component.Under the Modules menu locate the qdrant_IS module and open it. Select the Integrations tab and under the Consumed REST APIs section set the qdrant URL to the Qdrant cluster URL you copied.Lastly you need to configure some site properties in the semantic search demo application module VectorEmbeddingsDemoWith all prerequisites done you can now open the sample application.I have added some sample data taken from the Munich Airport FAQs. Click on the Bootstrap Sample Data button and wait until the Question-Answers pairs are displayed on the screen. Make sure that you have completed all the prerequisites above.While adding sample data behind the scenes the application creates vector embeddings for all questions (not answers) and adds them to your Qdrant Cluster collection.Now try your first search. Enter the search term “I am severely disabled. What do I need to know about this?” and click the Search button.The application takes the input search term and generates vector embeddings based on it. Subsequently, it conducts a similarity search within your Qdrant Cluster collection. Qdrant provides results along with a score ranging from 0.1 to 1.0, reflecting the proximity between the embeddings of the entered query and the question embeddings stored in the Cluster collection.Please note that your Qdrant Collection does not store the text of your questions but rather just its embeddings. It's up to your application to match Qdrant results to Question-Answer pairs stored in your applications database.Feel free to add additional question-answer pairs or create your own knowledge base from scratch. Try out different search terms and see how results and scoring change.Open Service Studio and the sample application module. In the Logic tab open the Articles_SaveArticle server action.The Articles_SaveArticle server action first adds or updates an article in the database.Then it calls the OpenAI_CreateEmbeddings server action of the OpenAI Embeddings Forge component. Embeddings are created for the question of our Question-Answer pair.OpenAI_CreateEmbeddings takes an API Key from the configured OpenAIKey site property and an OpenAI model that is suitable to create embeddings. At the time of writing, you can only use the text-embedding-ada-002 model. Last OpenAI_CreateEmbeddings takes an array of text (which later results in an array of generated embeddings). In our case that is the question.Upon success the embeddings are then written to your Qdrant Vector Database cluster using the Qdrant_UpsertPoints server action from the Qdrant Vector Database Forge component.Qdrant_UpsertPoints creates or updates vector embeddings.ApiKey and CollectioName are retrieved from the configured site properties.A point in Qdrant represents vector embeddings and is identified by a unique identifier. We use LocalArticleId which is either set to a new UUID (for new questions) or an existing Question-Answer pair UUID (for existing questions).The Vector property is set to the result (first result of the array) of the OpenAI_CreateEmbeddings server action.Last you can optionally add some additional Payload data. A MetadataId and one or more keywords. MetadataId can be used both for filtering query results and as a grouping identifier. Likewise, keywords can be used to filter query results.Note the ValidateCollection server action on top of the Articles_SaveArticle flow which checks if the Qdrant collection exists and if not creates the collection.When creating a Qdrant collection you need to specify the dimensions of the vector embeddings you want to store. OpenAI embeddings returns 1536 dimensions. If you want to use another Embeddings service, check their documentation on how many dimensions are returned.You must also set the Distance property. This specifies how similarity queries are performed in that collection. Qdrant supports Cosine, Dot and Euclide distance queries. More on that can be found in the documentation.qdrant.techNext open the Articles_SearchArticles server action in Vector Embeddings demo application. This server action takes a single SearchTerm as input parameter. It first performs a query against your Qdrant Cluster collection matching results with stored Question-Answer pairs in your application database.This server action flow has two main streamsIf no search term is given, then it just returns all Question-Answer pairs from the database.If a search term is given:In the sample application Qdrant_SearchPoints is configured to return a maximum of 6 articles. You can also add a scoring threshold filter, to only return records with a scoring greater than the value provided (ranging from 0.1 to 1.0)If this leads to no results the server action just exits out return an empty result list.If there is a result, the whole result is stored in local variable for later use.Articles_SearchArticles is used in the GetArticles data action in the Demo screen to retrieve all articles.With OutSystems, Vector Embeddings and a Qdrant Vector Database it is easy to add a semantic similarity text search capability to an application.Thank you for reading. I hope you liked it and that i have explained the important parts well. Let me know if not 😊If you have difficulties in getting up and running, please use the OutSystems Forum to get help. Suggestions on how to improve this article are very welcome. Send me a message via my OutSystems Profile or respond directly here on medium.If you like my articles, please leave some claps. Follow me and subscribe to receive a notification whenever I publish a new article. Happy Low Coding!Some rights reserved----ITNEXTDigital Craftsman, OutSytems MVP and Senior Director at Telelink Business ServicesStefan WeberinITNEXT--Juntao QiuinITNEXT--10Carlos ArguellesinITNEXT--26Stefan WeberinITNEXT--1Dominik PolzerinTowards Data Science--35Leonie MonigattiinTowards Data Science--23Irina NikinUX Collective--14Mai Văn Khánh--4Akshay Kokane--Timothy MugayiinBetter Programming--99HelpStatusWritersBlogCareersPrivacyTermsAboutText to speechTeams



This post first appeared on VedVyas Articles, please read the originial post: here

Share the post

OutSystems, OpenAI Embeddings and Qdrant Vector Database — Find Similar

×

Subscribe to Vedvyas Articles

Get updates delivered right to your inbox!

Thank you for your subscription

×