How Technology can help to Bolster Employee Engagement and Happiness
Unlock employee happiness and engagement with technology. Discover strategies like flexible work, co...
FINE-TUNING LARGE LANGUAGE MODELS
INTRODUCTION
One of the benefits of large language models (LLMs) is that they are widely applicable across a range of different applications. Unlike traditional Machine Learning approaches where the model has to be trained separately for different types of datasets, LLMs, because of their architecture and the vast corpus of data that they have been trained on, can be used without any further training.
However, there are still situations where LLMs need to be trained on data specific to organizations. The RAG (Retrieval Augmented Generation) method that suffices for most applications, still suffers from the small context window that LLMs provide. Open source LLMs typically have 4K-token windows while GPT has windows ranging from 4K-32K tokens. What this implies is that for organizations having large amounts of domain-specific data, the responses returned using RAG will not be complete since the context size is a limitation.
FINE TUNING
This is where model fine-tuning makes its appearance. One of the other benefits of LLMs is the ability to do transfer learning. This involves fine-tuning a pretrained model on a smaller, task-specific dataset to achieve high performance on the task. However, many fine-tuning approaches are still computationally intensive or pose a trade-off between efficiency and model quality.
In order to reduce the time, compute and memory complexity of fine tuning models, Houlsby et. al. came up with an approach called Parameter Efficient Fine Tuning (PEFT). PEFT uses a new architecture that works with the original transformer architecture by modifying it. The authors introduced the concept of adapters which are interleaved with the transformer architecture as shown in the figure below:
The adaptors are the only ones whose weights are re-trained. All previously trained weights of the original transformer model remain fixed, thereby ensuring that the previous learning is not lost.
LoRA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS
To make this process more efficient, Hu et. al. proposed an approach called LoRA that “freezes the pretrained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of Trainable Parameters for downstream tasks”. In other words, they reduce the number of trainable parameters in the adaptors. They also modify the architecture. Whereas Houlsby and team had used a serial approach by interleaving their adapters within the transformer modules, Hu and co-workers take a different approach as shown below:
Figure credit: https://arxiv.org/pdf/2304.01933.pdf
The adapters (A and B, on the right extreme) are placed in parallel to the pre-trained transformers, and are added at the end. This leads to Higher Validation Accuracies at much lower numbers of trainable parameters as shown below:
The pink triangles are the performance by LoRA and as you can see, higher validation accuracies are achieved at significantly lower numbers of trainable parameters. This also implies that the costs associated with fine-tuning can be significantly reduced.
CONCLUSION
For nearly 95% of all use cases that are currently being observed, the RAG approach suffices to provide solutions that work remarkably well. However, LLMs, however large they are, still have an inherent limitation in terms of the context size. For certain cases, it is much better to fine tune models using data specific for a task. This however, can become quite expensive. Researchers have fortunately been working on ways and techniques to make this process more efficient. Approaches like PEFT and LoRA can help to make the fine-tuning process truly efficient and reduce time, computation and memory costs.
REFERENCES
https://doi.org/10.48550/arXiv.2106.09685
https://doi.org/10.48550/arXiv.1902.00751
https://doi.org/10.48550/arXiv.2304.01933
SENIOR VICE PRESIDENT, DATA BU
Unlock employee happiness and engagement with technology. Discover strategies like flexible work, co...
Discover Beak - An Intelligent GPS for Infrastructure Monitoring, SOC, NOC & RMM. Streamline ope...
Explore Microsoft Fabric, the cutting-edge data analytics platform that combines AI capabilities wit...
iLink Digital specializes in seamless Crystal Reports to Power BI migration. Explore feature compari...
In today's interconnected business landscape, seamless data exchange between systems is crucial for...
Building attended automation is crucial for businesses seeking operational efficiency and improved u...
In this blog post, we delve into the unique realms of conversational AI and generative AI. We explor...
Get updates delivered right to your inbox!
Please follow the link we've just sent you to activate the subscription.
×