April 19th 2023

Shows some latest developments in NLP and LLM-based AI solutions in Enterprise cloud environments

This article is the second part of the serials to explore Openai and other Large Language Models (LLM) usages.

Ø§Ø³ØªÙƒØ´Ù Ø£ÙØ¶Ù„ ÙƒØªØ¨ Ø§Ù†Ø¬Ù„…
The Ultimate Gaming Experience: Explo…
A List of the Best College Graduation…
Understanding the Disadvantages of So…
Another Appliance Company Now Files A…

The first article of the serials is here for your reference

Understand how to use ChatGPT APIs by analyzing a few new open-source projects

Google’s new development and offerings

Google released a new solution to use Generative AI plus Enterprise Search for companies that need to build AI based applications.
In their recent Data and AI Summit, they had a demo to show what the new AI based search tool/app looks like.

This is their AI App builder tool called “Gen App Builder”.

We tell it to use Data Sources like Website URLs and the documents we manually added/uploaded.

In the right panel, we will be able to search for answers and it will provide answers based on the contexts. It will also list the reference links.

They haven’t released the app builder and app sample code yet but they allow us to join their waiting list for further information

AWS’s new development and offerings

AWS just announced Amazon Bedrock, a new service that makes FMs from AI21 Labs, Anthropic, Stability AI, and Amazon accessible via an API. Bedrock is the easiest way for customers to build and scale generative AI-based applications using FMs, democratizing access for all builders. Bedrock will offer the ability to access a range of powerful FMs for text and images — including Amazon’s Titan FMs, which consist of two new LLMs we’re also announcing today — through a scalable, reliable, and secure AWS managed service. Titan Text is a generative large language model (LLM) for tasks such as summarization, text generation (for example, creating a blog post), classification, open-ended Q&A, and information extraction.

AWS Bedrock is still in preview and you need to contact their account managers to get access to it.

Azure’s new development and offerings

Azure releases their OpenAI Service for Azure and here is one of their demo application that uses Azure OpenAI Service and Azure Cognitive Search

The demo app’s code repo is here

GitHub - Azure-Samples/azure-search-openai-demo: A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure Cognitive Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.

Like those tools we discussed in our part 1 article, this app uses the same design. Its frontend is a ReactJS-based app, the backend uses Flask and Langchain.

The app initializes its clients for Azure OpenAI and Azure Cognitive Search.

When a user asks a question, the app will use prompts like these to instruct OpenAI to provide answers based on the search results as context.

class ChatReadRetrieveReadApproach(Approach):
    prompt_prefix = """system
Assistant helps the company employees with their healthcare plan questions, and questions about the employee handbook. Be brief in your answers.
Answer ONLY with the facts listed in the list of sources below. If there isn't enough information below, say you don't know. Do not generate answers that don't use the sources below. If asking a clarifying question to the user would help, ask the question.
For tabular information return it as an html table. Do not return markdown format.
Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. Use square brakets to reference the source, e.g. [info1.txt]. Don't combine sources, list each source separately, e.g. [info1.txt][info2.pdf].
{follow_up_questions_prompt}
{injected_prompt}
Sources:
{sources}

{chat_history}
"""

    follow_up_questions_prompt_content = """Generate three very brief follow-up questions that the user would likely ask next about their healthcare plan and employee handbook. 
    Use double angle brackets to reference the questions, e.g. .
    Try not to repeat questions that have already been asked.
    Only generate questions and do not generate any text before or after the questions, such as 'Next Questions'"""

    query_prompt_template = """Below is a history of the conversation so far, and a new question asked by the user that needs to be answered by searching in a knowledge base about employee healthcare plans and the employee handbook.
    Generate a search query based on the conversation and the new question. 
    Do not include cited source filenames and document names e.g info.txt or doc.pdf in the search query terms.
    Do not include any text inside [] or > in the search query terms.
    If the question is not in English, translate the question to English before generating the search query.

Chat History:
{chat_history}

Question:
{question}

Search query:
"""

Langchain is a very popular and powerful tool to chain all reactions.

For example, we can use Langchain’s adapter to work with many LLM models including OpenAI, Huggingface, and many others. It also supports most vector databases and “AI Hubs”.

We don’t need to code text ingestion, splitting, and embedding tasks manually. Langchain can handle all of them

Really cool example to show how to analyze Twitter code using OpenAI and Langchain

Analysis of Twitter the-algorithm source code with LangChain, GPT4 and Deep Lake

The code is very simple and clear. It uses git to clone Twritter’s new open-source algorithm repo, run an ‘os.walk’ to traverse all the files, and use Langchain’s TextLoader to load and split the texts

import os
from langchain.document_loaders import TextLoader

root_dir = './the-algorithm'
docs = []
for dirpath, dirnames, filenames in os.walk(root_dir):
    for file in filenames:
        try: 
            loader = TextLoader(os.path.join(dirpath, file), encoding='utf-8')
            docs.extend(loader.load_and_split())
        except Exception as e: 
            pass

It then uses Activeloop’s Deeplake library to save the texts and their embeddings. Note the embedding function uses OpenAI’s embedding service

import os
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import DeepLake

os.environ['OPENAI_API_KEY']='sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
embeddings = OpenAIEmbeddings()

...
...
db = DeepLake.from_documents(texts, embeddings, dataset_path="hub://davitbun/twitter-algorithm")

then it chains OpenAI’s model with the Deeplake’s retriever
Deeplake here acts like a vector database just like all the other applications we discussed before

retriever = db.as_retriever()
retriever.search_kwargs['distance_metric'] = 'cos'
retriever.search_kwargs['fetch_k'] = 100
retriever.search_kwargs['maximal_marginal_relevance'] = True
retriever.search_kwargs['k'] = 20

...

from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain

model = ChatOpenAI(model='gpt-4') # 'gpt-3.5-turbo',
qa = ConversationalRetrievalChain.from_llm(model,retriever=retriever)

Once the chain is setup, it will pass questions one by one with chat history to OpenAI to get answers. The answers are based on the Twitter code we saved in the Activeloop’s Deeplake

questions = [
    "What does favCountParams do?",
    "is it Likes + Bookmarks, or not clear from the code?",
    "What are the major negative modifiers that lower your linear ranking parameters?",   
    "How do you get assigned to SimClusters?",
    "What is needed to migrate from one SimClusters to another SimClusters?",
    "How much do I get boosted within my cluster?",   
    "How does Heavy ranker work. what are it’s main inputs?",
    "How can one influence Heavy ranker?",
    "why threads and long tweets do so well on the platform?",
    "Are thread and long tweet creators building a following that reacts to only threads?",
    "Do you need to follow different strategies to get most followers vs to get most likes and bookmarks per tweet?",
    "Content meta data and how it impacts virality (e.g. ALT in images).",
    "What are some unexpected fingerprints for spam factors?",
    "Is there any difference between company verified checkmarks and blue verified individual checkmarks?",
] 
chat_history = []

for question in questions:  
    result = qa({"question": question, "chat_history": chat_history})
    chat_history.append((question, result['answer']))
    print(f"-> **Question**: {question} \n")
    print(f"**Answer**: {result['answer']} \n")

This simple app shows us how easy it is to use Langchain to work with LLM and other Cloud services.

There is another good example to demo how to use Langchain to process PDF documents and answer questions

GitHub - mayooear/gpt4-pdf-chatbot-langchain: GPT4 & LangChain Chatbot for large PDF docs

Here are some good YouTube videos too

Hope this will help you to get familiar with current LLM-based AI solutions

Cheers, and happy coding.

👋 If you find this helpful, please click the clap 👏 button below a few times to show your support for the author 👇

🚀Join FAUN Developer Community & Get Similar Stories in your Inbox Each Week

How to use chatGPT and OpenAI API services in the Enterprise was originally published in FAUN Publication on Medium, where people are continuing the conversation by highlighting and responding to this story.

This post first appeared on Top Digital Transformation Strategies For Business Development: How To Effectively Grow Your Business In The Digital Age, please read the originial post: here