January 30th 2023

InstructGPT is the successor to the GPT-3 large language Model (LLM) developed by OpenAI.

Understanding InstructGPT

InstructGPT is the result of an overhaul of the GPT-3 language model. Responding to user complaints about GPT-3, creator OpenAI made the new and improved model:

The Ultimate Guide to Cloud Gaming: D…
best projectors for home

Better at following English instructions.
Less inclined to spread misinformation (more truthful), and
Less likely to produce toxic results or those that reflect harmful sentiments.

The problem with GPT-3 arose because it was trained to predict the next word from a large dataset and not to safely perform the task the user wanted. To address the problem, OpenAI used a technique known as reinforcement learning from human feedback (RLHF).

The three steps in InstructGPT training

The RLHF process can best be described as a 3-step feedback cycle between the person, reinforcement learning, and the model’s understanding of the goal.

To better understand this process, let’s explain each step.

Step 1 – Collect human-written demonstration data and train a supervised policy

Once a prompt has been sampled from a dataset, a labeler demonstrates desirable output behavior. These can be submitted by GPT-3 users, but OpenAI researchers also guide labelers based on written instructions, informal conservation, and feedback on specific examples where necessary.

Then, the data are used to refine GPT-3 by training supervised learning baselines.

Step 2 – Collect comparison data and train the reward model

Next, a dataset of human-labeled comparisons between two outputs on a larger set of prompts is collected. Several model outputs are sampled from a prompt, and the labeler ranks each output from best to worst.

The reward model (RM) is then trained on this dataset to clarify which output OpenAI’s labelers prefer.

Step 3 – Use the reward model as a reward function to fine-tune the GPT-3 policy

In step three, a new prompt is sampled from the dataset, and based on the above, the policy generates an output and calculates a reward. The reward is maximized by the company’s Proximal Policy Optimization (PPO) algorithm

The result is that InstructGPT is much better at following instructions.

InstructGPT vs. GPT-3

Instruct GPT-3 is the model of choice for OpenAI labelers despite it having 100x fewer parameters than the model on which it is based. The company also noted that “at the same time, we show that we don’t have to compromise on GPT-3’s capabilities, as measured by our model’s performance on academic NLP evaluations.”

InstructGPT models were in beta mode on the API for over twelve months and are now its default language models. Moving forward, OpenAI believes that model refinement with humans in the loop is the most effective way to improve reliability and safety.

Key takeaways

InstructGPT is the successor to the GPT-3 large language model (LLM) developed by OpenAI. It was developed in response to user complaints about the toxic or harmful results generated by GPT-3.
To address the problem, OpenAI used a technique known as reinforcement learning from human feedback (RLHF). The process is best described as a 3-step feedback cycle between a human, reinforcement learning, and the model’s understanding of the goal.
Despite the increase in performance, it is worth noting that Instruct GPT3 is the model of choice for OpenAI labelers despite it having 100x fewer parameters.

Connected AI Concepts

AGI

Generalized AI consists of devices or systems that can handle all sorts of tasks on their own. The extension of generalized AI eventually led to the development of Machine learning. As an extension to AI, Machine Learning (ML) analyzes a series of computer algorithms to create a program that automates actions. Without explicitly programming actions, systems can learn and improve the overall experience. It explores large sets of data to find common patterns and formulate analytical models through learning.

Deep Learning vs. Machine Learning

Machine learning is a subset of artificial intelligence where algorithms parse data, learn from experience, and make better decisions in the future. Deep learning is a subset of machine learning where numerous algorithms are structured into layers to create artificial neural networks (ANNs). These networks can solve complex problems and allow the machine to train itself to perform a task.

DevOps

DevOps refers to a series of practices performed to perform automated software development processes. It is a conjugation of the term “development” and “operations” to emphasize how functions integrate across IT teams. DevOps strategies promote seamless building, testing, and deployment of products. It aims to bridge a gap between development and operations teams to streamline the development altogether.

AIOps

AIOps is the application of artificial intelligence to IT operations. It has become particularly useful for modern IT management in hybridized, distributed, and dynamic environments. AIOps has become a key operational component of modern digital-based organizations, built around software and algorithms.

Machine Learning Ops

Machine Learning Ops (MLOps) describes a suite of best practices that successfully help a business run artificial intelligence. It consists of the skills, workflows, and processes to create, run, and maintain machine learning models to help various operational processes within organizations.

OpenAI Organizational Structure

OpenAI is an artificial intelligence research laboratory that transitioned into a for-profit organization in 2019. The corporate structure is organized around two entities: OpenAI, Inc., which is a single-member Delaware LLC controlled by OpenAI non-profit, And OpenAI LP, which is a capped, for-profit organization. The OpenAI LP is governed by the board of OpenAI, Inc (the foundation), which acts as a General Partner. At the same time, Limited Partners comprise employees of the LP, some of the board members, and other investors like Reid Hoffman’s charitable foundation, Khosla Ventures, and Microsoft, the leading investor in the LP.

OpenAI Business Model

OpenAI has built the foundational layer of the AI industry. With large generative models like GPT-3 and DALL-E, OpenAI offers API access to businesses that want to develop applications on top of its foundational models while being able to plug these models into their products and customize these models with proprietary data and additional AI features. On the other hand, OpenAI also released ChatGPT, developing around a freemium model. Microsoft also commercializes opener products through its commercial partnership.

OpenAI/Microsoft

OpenAI and Microsoft partnered up from a commercial standpoint. The history of the partnership started in 2016 and consolidated in 2019, with Microsoft investing a billion dollars into the partnership. It’s now taking a leap forward, with Microsoft in talks to put $10 billion into this partnership. Microsoft, through OpenAI, is developing its Azure AI Supercomputer while enhancing its Azure Enterprise Platform and integrating OpenAI’s models into its business and consumer products (GitHub, Office, Bing).

Stability AI Business Model

Stability AI is the entity behind Stable Diffusion. Stability makes money from our AI products and from providing AI consulting services to businesses. Stability AI monetizes Stable Diffusion via DreamStudio’s APIs. While it also releases it open-source for anyone to download and use. Stability AI also makes money via enterprise services, where its core development team offers the chance to enterprise customers to service, scale, and customize Stable Diffusion or other large generative models to their needs.

Stability AI Ecosystem

Main Free Guides:

Business Models
Business Strategy
Business Development
Digital Business Models
Distribution Channels
Marketing Strategy
Platform Business Models
Tech Business Model

The post InstructGPT And Why It Matters For The Success Of ChatGPT appeared first on FourWeekMBA.

This post first appeared on FourWeekMBA, please read the originial post: here

People also like

The Ultimate Guide to Cloud Gaming: Discover the Best Services

best projectors for home

InstructGPT And Why It Matters For The Success Of ChatGPT

Understanding InstructGPT

Related Articles