Generative AI is everywhere. Its ascendancy has many wondering whether we’re on the cusp of artificial general intelligence. Truth is, we still have a way to go; while the large language models (LLMs) behind GenAI signal a new frontier in how machines understand and process natural language, they are far from perfect. GenAI has certainly been impressive in improving customer support, generating code to accelerate software development, and in language translation. But LLMs can’t store data, which means they can’t answer prompts based on data they weren’t initially trained on.
There’s no denying this very real limitation. There have been countless instances of GenAI producing inaccurate or irrelevant responses. These “hallucinations” are sometimes comical, other times they have very serious, real-world consequences. Two primary approaches have emerged to address this: fine-tuning and retrieval augmented generation (RAG).
Fine-Tuning: Flawed, But Not Fruitless
Retraining essentially means adapting a pre-trained model to do a specific task or solve a specific problem via the introduction of smaller, more specific data sets. The idea is to adapt it for particular tasks or contexts. Although performance can improve under specific circumstances, this approach opens a can of worms vis-a-vis maintaining data recency. Simply put, fine-tuning regularly is, at best, barely feasible, if not completely impossible.
Furthermore, we can’t forget that any data put into an LLM can be used as part of an response to a prompt. That raises very serious questions about privacy. Imagine if an LLM was re-trained with personal identifiable information (PII). That could reveal highly sensitive information, in some cases, even to malicious actors. So it’s no surprise that training LLMs with PII have already come under regulatory scrutiny – see, for example, Singapore’s efforts to foster global consensus on consistent principles to fortify trust in generative AI.
That said, there are situations where fine-tuning can be useful. Retraining embedding models is one. An embedding model takes semantic inputs and generates vectors to represent the inputs. If an embedding model doesn’t recognize or understand a word, then that word will map into a vector that isn’t related to the meaning of that word.
For example, if an embedding model was built before the arrival of social media platform TikTok, it might not recognize the meaning of the word and create a vector that is associated with “clocks” or “sounds”. Arming the embedding model with the correct meaning of TikTok would require fine-tuning.
Speed and Security With RAG
On the other hand, instead of modifying the model itself, RAG supplements LLMs with real-time external data, typically stored in vector databases. This means that data can be instantly updated and available for querying, avoiding the need for fine-tuning to “batch up the data”.
RAG also offers fine-grained access control. For example, with a chatbot, the developer can introduce code to ensure that the query filters only allow personal data to be retrieved related to the person asking for the data. Through the separation of data from the model, with access granted only when needed, RAG minimizes the dilemma of external sources risking sensitive information.
Not a Zero Sum Game
In practical terms, there’s no need to dismiss fine-tuning. After all, it works great when the model needs to be updated with specific datasets or when dealing with publicly accessible data that doesn’t change frequently. That said, it has obvious limitations. Applications that demand real-time accuracy and protection of sensitive information require the qualities that RAG brings. This is particularly pertinent for sectors like healthcare and finance, where accurate, up-to-date information are bread and butter.
For instance, in the medical services, fine-tuning is likely to be time consuming, while also potentially breaching patient privacy. In contrast, RAG is hampered by neither.
The same is also true for finance and e-commerce, where rapid decision-making and data security are paramount. By providing instant access to relevant data while maintaining strict privacy controls, RAG empowers organizations to leverage the full potential of LLMs without compromising on data integrity or security.
Looking ahead, as the digital landscape continues to evolve, the ability to harness real-time external data will be essential for creating more intelligent and adaptable AI systems. RAG is a game changer in this regard, enabling new possibilities in language processing that paves the way for a future where machines and humans are in sync.