October 14th 2023

Lukas GamperFollowBetter Programming--ListenShareChatGPT is an awesome tool to answer annoying customers, write an outline the shareholder ordered, or handle corporate junk that pops up on your desk. But specific knowledge about your company is required to answer annoying customer questions. Most likely, the answer is already in the Documentation, but ChatGPT didn’t read it.The easiest way would be to just pass the whole documentation to ChatGPT, but GPT-4 can only handle about 30 pages of input. Also, passing a lot of information to GPT-4 makes it slow and gets expensive fast. It would be useful to have a tool to find the imported parts out of the ocean of information. That’s where embeddings enter the game.In this article, we will explain what embeddings are and how we can use them to empower ChatGPT to answer customer questions with your documentation’s knowledge.A text embedding — or embedding — is a mathematical representation of the information contained in the text. This allows us to tell which text is about the same topic and which don’t have much in common.As an example, let's look at three sentences:Even though the first and the third sentences share many words, the embedding of the first and second sentences is much closer than the ones from the first and third. This comes because the subject is very close in the first two, which differs from the third.Embeddings are represented as large vectors of numbers. These numbers encode the information contained in the text. By measuring the distance between two embedding vectors, we can determine which texts contain similar information and which are about different topics.This is the tool we were looking for! This enables us to find the relevant parts from a huge amount of text.First, we prepare the documentation to find the relevant parts to answer a question.Now, we are ready to answer the following questions:There is a lot of research about how to chunk a large text. A common finding seems to be that a chunk should not be longer than 2,000 words.For splitting the documentation into chunks, we found that splitting in logical sections gives a much better result than splitting at fixed character sizes. Another finding was that too short sections do not contain enough information to create a good, reliable embedding. Since adjacent sections usually contain similar information, we can group several sections together until we reach at least 500 characters.For example, the Python code below shows how to parse an HTML documentation into sections. First, we use the BeautifulSoup package to parse the HTML and traverse the elements. While traversing, we can convert the HTML into text. We assume thatH1 and H2 tags represent section titles. If we reach one of these elements, we check if our chunk is long enough. If so, we start a new chunk in the next section. Here’s what that looks like:There is an infinite amount of transformers that can create embeddings. The ADA-002 transformer from OpenAI outperforms all locally installable transformers we tested. That's why we use the transformer from OpenAI to produce the embeddings.We loop over each chunk in the Python code below and send it to ADA-002. The Embedding is saved directly in the record of the chunk. Here’s the code:Now, we have a library of embeddings for the documentation. Let’s save it to a JSON file. We could also use a vector database, but unless your documentation exceeds thousands of pages, a vector database is overkill.The following Python code saves all our chunks to a file. It makes sense to add a version to the filename so you can easily determine if you have to regenerate the embeddings.To compare the question with the documentation, we need to calculate the embedding of the question.The following Python code embeds the question using the ADA-002 transformer:First, we need to load the documentation with the according embeddings from the JSON file created above. Afterward, we can get the three closest embeddings.The following Python code loads the documentation chunks from the disk and picks the three closest embeddings:Now we have all the information to answer the question. We only need someone to formulate an answer. For this task, GPT-4 is a perfect fit.If ChatGPT is called from the API, it offers to describe the system we want ChatGPT to represent. In our case, we can instruct ChatGPT to behave like the following:For the content part — this is called the user part in ChatGPT — we pass the question and the sections from the documentation with the instruction Answer the following question only using informations from the promt. ”””””” Documentation The following Python code calls GPT-4 with the prompt described above:Congratulations! We managed to answer a customer question using information extracted from the documentation. But the system suffers from two problems:If you call ChatGPT on the normal web interface and ask any question, it will give you an answer — except the answer is offensive. It doesn’t matter if the question has an answer or not. The answer will sound reasonable, even if the answer does not exist or, for whatever reason, ChatGPT decided to make up a story. This is called a hallucination.In some cases, a hallucination is a good thing. If you want ChatGPT to create a new novel, you want it to hallucinate. But in our case, we want it to stick with the information in the prompt.In the API, there is a temperature that controls the level of hallucination. The normal level is 1, but to answer our question, we want to call GPT with a temperature of 0. This does not fully prevent GPT from hallucinating, but it gets much better.To compare the question with the documentation, we need a good embedding of the question. But normally, the questions are quite short, which leads to random results in the ranking of the documentation.This can be improved by letting GPT make up a hallucinated answer and calculate the embedding of the question and the hallucinated answer. Even though the answer is made up, it elaborates the vocabulary of the question. This improves the embedding a lot.In our specific case, we use GPT-3.5 to generate the hallucinated answer since it’s made up anyway, and GPT-3.5 is much faster and cheaper than GPT-4.To make sure GPT makes up a random answer, we pass a temperature of 1.2. This is the sweet spot of still answering in complete sentences but with maximum randomness. If we increase the temperature even more, GPT stops making sentences and starts outputting random characters.The following Python generates the embedding from the question and a hallucinated answer.We are amazed to see how far AI technology has progressed. To use it for your own case, it needs a few steps and some logic. But we have had a great experience with the embedding so far.Or, as ChatGPT would say: We are thrilled to see how our deepest dreams are coming true and how AI helps us get rid of the boring and repetitive work. This frees up our time and energy for more creative and demanding tasks. Like creating more AIs and removing more boring tasks from our table 🤣.----Better ProgrammingCreating www.webling.ch & AI-EnthusiastLukas GamperinBetter Programming--9Benoit RuizinBetter Programming--203VinitainBetter Programming--34Lukas Gamper--Diana DovgopolinArtificial Corner--39Thomas SmithinThe Generator--49Adrian H. RaudaschlinTowards Data Science--14Damian GilinTowards AI--4Sam OlliverinThe Productivity Powerhouse--1AL Anany--404HelpStatusAboutCareersBlogPrivacyTermsText to speechTeams

LEI Register in the UK: Your Passport…
TP-Link Tapo C100 Home Security Wi-Fi…
Les Secrets du SuccÃ¨s en Programmati…
The Most Popular and Best WordPress T…
Exploring the World of Freelance Tech…

This post first appeared on VedVyas Articles, please read the originial post: here