Parameter-Efficient Fine-Tuning of LLMs using LoRA (Low-Rank Adaptation)

Large Language Models (LLMs) are capable of extraordinary NLP feats, but are so large that they’re too expensive for most organizations to train. The solution is Parameter-Efficient Fine-Tuning (PEFT) with Low-Rank Adaptation (LoRA).

This discussion comes in the wake of introducing models like Alpaca, Vicuña, GPT4All-J, and Dolly 2.0, which demonstrated the power of fine-tuning with thousands of instruction-response pairs.

Training LLMs, even those with tens of billions of parameters, can be prohibitively expensive and technically challenging. One significant issue is “catastrophic forgetting,” where a model, after being retrained on new data, loses its ability to perform previously learned tasks. This challenge necessitates a more efficient approach to fine-tuning.

PEFT

By reducing the memory footprint and the number of parameters needed for training, PEFT methods like LoRA and AdaLoRA make it feasible to fine-tune large models on standard hardware. These techniques are not only space-efficient, with model weights requiring only megabytes of space, but they also avoid catastrophic forgetting, perform better with small data sets, and generalize better to out-of-training-set instructions. They can also be applied to other A.I. use cases — not just NLP — such as machine vision.

LoRA

LoRA stands out as a particularly effective PEFT method. It involves inserting low-rank decomposition matrices into each layer of a transformer model. These matrices represent data in a lower-dimensional space, simplifying computational processing. The key to LoRA’s efficiency is freezing all original model weights except for the new low-rank matrices. This strategy reduces the number of trainable parameters by approximately 10,000 times and lowers the memory requirement for training by about three times. Remarkably, LoRA sometimes not only matches but even outperforms full-model training in certain scenarios. This efficiency does not come at the cost of effectiveness, making LoRA an attractive option for fine-tuning LLMs.

AdaLoRA

AdaLoRA, a recent innovation by researchers at Georgia Tech, Princeton, and Microsoft, builds on the foundations of LoRA. It differs by adaptively fine-tuning parts of the transformer architecture that benefit most from it, potentially offering enhanced performance over standard LoRA.

These developments in PEFT and the emergence of tools like LoRA and AdaLoRA mark an incredibly exciting and promising time for data scientists. With the ability to fine-tune large models efficiently, the potential for innovation and application in the field of AI is vast and continually expanding.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

Parameter-Efficient Fine-Tuning of LLMs using LoRA (Low-Rank Adaptation)

Data Science

Getting Value From A.I.

The Chinchilla Scaling Laws

StableLM: Open-Source “ChatGPT”-Like LLMs You Can Fit on One GPU