RAG (Retrieval Augmented Generation) is a well understood pattern for connecting corporate/private data to LLMs, and simple implementations are everywhere.
However, the simplest implementation of RAG (vectorized chunks) discards much of the structured context of the original data/documents, which can lead to LLM hallucinations and poor quality responses.
Preserving structured data when vectorizing, and using it to intelligently retrieve relevant context, can reduce RAG hallucinations and misunderstandings significantly.
My take:
Amateur hour is over!
Hat tip to Ken Judy for the link — this is a great example of how the LLM conversation is now evolving to look more critically at tools and approaches, and proposing improved architectures derived from real world experience.
Stride has implemented RAG in several different ways, and it’s true that the quick and dirty approach has limits, but it’s so easy to implement that you can get something up and running just by converting all your documents to easy formats (we often use PDF), vacuum them in and give it a shot. Most of the time, the results are great! But the more structure in the data (and the more intricate the questions), the harder it is to get the right context to the LLM before it responds.
The proposed approach here is an evolution of the current stack, not a reinvention — it’s basically a smart index that sits in front of a vector DB and allows for more intelligent context retrieval. Not every use case will need it, but if you do, the pattern is now established. As the market matures, we’ll see many more examples like this — less “wow look this is magic” and more “here’s what you should consider before you build” — which will be a big unlock for adoption going forward.