Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

FINE-TUNING LARGE LANGUAGE MODELS

FINE-TUNING LARGE LANGUAGE MODELS

INTRODUCTION 

One of the benefits of large language models (LLMs) is that they are widely applicable across a range of different applications. Unlike traditional Machine Learning approaches where the model has to be trained separately for different types of datasets, LLMs, because of their architecture and the vast corpus of data that they have been trained on, can be used without any further training. 

However, there are still situations where LLMs need to be trained on data specific to organizations. The RAG (Retrieval Augmented Generation) method that suffices for most applications, still suffers from the small context window that LLMs provide. Open source LLMs typically have 4K-token windows while GPT has windows ranging from 4K-32K tokens. What this implies is that for organizations having large amounts of domain-specific data, the responses returned using RAG will not be complete since the context size is a limitation. 

FINE TUNING 

This is where model fine-tuning makes its appearance. One of the other benefits of LLMs is the ability to do transfer learning. This involves fine-tuning a pretrained model on a smaller, task-specific dataset to achieve high performance on the task. However, many fine-tuning approaches are still computationally intensive or pose a trade-off between efficiency and model quality. 

In order  to reduce the time, compute and memory complexity of fine tuning models, Houlsby et. al. came up with an approach called Parameter Efficient Fine Tuning (PEFT). PEFT uses a new architecture that works with the original transformer architecture by modifying it. The authors introduced the concept of adapters which are interleaved with the transformer architecture as shown in the figure below: 

The adaptors are the only ones whose weights are re-trained. All previously trained weights of the original transformer model remain fixed, thereby ensuring that the previous learning is not lost. 

LoRA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS 

To make this process more efficient, Hu et. al. proposed an approach called LoRA that “freezes the pretrained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of Trainable Parameters for downstream tasks”. In other words, they reduce the number of trainable parameters in the adaptors. They also modify the architecture. Whereas Houlsby and team had used a serial approach by interleaving their adapters within the transformer modules, Hu and co-workers take a different approach as shown below: 

Figure credit: https://arxiv.org/pdf/2304.01933.pdf 

The adapters (A and B, on the right extreme) are placed in parallel to the pre-trained transformers, and are added at the end. This leads to Higher Validation Accuracies at much lower numbers of trainable parameters as shown below: 


The pink triangles are the performance by LoRA and as you can see, higher validation accuracies are achieved at significantly lower numbers of trainable parameters. This also implies that the costs associated with fine-tuning can be significantly reduced. 

CONCLUSION 

For nearly 95% of all use cases that are currently being observed, the RAG approach suffices to provide solutions that work remarkably well. However, LLMs, however large they are, still have an inherent limitation in terms of the context size. For certain cases, it is much better to fine tune models using data specific for a task. This however, can become quite expensive. Researchers have fortunately been working on ways and techniques to make this process more efficient. Approaches like PEFT and LoRA can help to make the fine-tuning process truly efficient and reduce time, computation and memory costs. 

REFERENCES 

https://doi.org/10.48550/arXiv.2106.09685 

https://doi.org/10.48550/arXiv.1902.00751 

https://doi.org/10.48550/arXiv.2304.01933 

Arun Krishnan

SENIOR VICE PRESIDENT, DATA BU

Linkedin
About Author
Arun is the Sr. VP and Practice head for Analytics and AI at iLink Digital with over 24 years of cross-geographical experience in academia and industry. His exceptional proficiency lies in implementing cutting-edge algorithms and solutions for data analysis and pattern recognition in the domains of Engineering, Information Technology, and Biotechnology. His extensive wealth of knowledge and experience, coupled with a robust publication record, positions Arun as a key driving force in pioneering innovation and achieving transformative results within the field of Analytics and AI. 
SHARE
Share on facebook
Share on google
Share on twitter
Share on linkedin

Related Blog Posts



This post first appeared on What Is Digital Transformation?, please read the originial post: here

Share the post

FINE-TUNING LARGE LANGUAGE MODELS

×

Subscribe to What Is Digital Transformation?

Get updates delivered right to your inbox!

Thank you for your subscription

×