Top Enterprise RAG predictions
How Enterprise RAG has evolved and what we can expect to see next...
8-minute read time2024: The year Enterprise RAG graduated from experimental
Looking back, Gartner predicted in 2022 that by 2024, the adoption of hyper-automation would reach around 65% of enterprises. In 2023, Deloitte said that by 2024 enterprises’ spending on GenAI would grow by 30%.
In the first half of 2024, we saw first-hand how enterprises moved from experimentation to production deployment of Gen AI use cases, and in a very short time. This in spite of the ongoing and sometimes lively debate on “how” - how to make it scalable and how to make it safe. According to Deloitte’s recent Gen AI survey, 42% of organizations are seeing significant gains in productivity, efficiency, and cost. In the second half of 2024, the question changed to: what’s next? How do we get from incremental and enterprise citizen gains to real transformative industry use cases? How do we do Gen AI that meets enterprises’ needs, and specifically over their own data? How do we reach real ROI? And how do we measure this?
Although the journey continues, 2024 has fortunately given us some answers. Enterprises are choosing Retrieval Augmented Generation (RAG) for 30-60% of their use cases. RAG comes into play whenever the use case demands high accuracy, transparency, and reliable outputs — particularly when the enterprise wants to use its own or custom data. RAG's abilities to reduce hallucinations, provide explainability and transparency, and ensure the security and privacy of enterprise data sets have made RAG emerge as a standard.
There’s been a remarkable evolution over the last 12 months. The rise of ‘Enterprise RAG’ combined with the significantly enhanced capabilities of models (e.g., GPT-4o, Gemini-2.0, Llama-3.3, and Anthropic-3.5), to enable organizations to move from incremental, internal use cases to ROI-impacting use cases, and scale them into production.
Quick recap: 6 main improvements to RAG in 2024
Much of the readiness to start moving into production with more sophisticated use cases is thanks to the improvements seen in RAG itself over the past year. In 2024, we’ve seen it:
→ Get faster: LLMs have, in general, become 7x faster. This increased speed allows for a better end-user experience and application response times, leading to new internal and external business opportunities with better customer experiences.
→ Become more economical: Platforms like Vectara that provide end to end RAG out of the box and reduce workload and maintenance cost, and simplify the mere fact that DIY RAG will most likely involve 20+ APIs and 5-10 vendors to manage.
→ Allow larger contexts with higher accuracy: LLMs in general have extended support for longer context. This enables the LLM to use more facts, and longer chunks in the generation step, resulting in more accurate responses, with less risk for generation of answers not based in facts.
→ Use your data, in your environment: Leading enterprise RAG platforms have moved to offer on-premises or in-your-VPC deployment, ensuring that your data never leaves your environment. This helps companies address both the safety and security of their data as well as the data gravity of where their workloads already exist.
→ Lower hallucination rates: LLMs continue to demonstrate lower intrinsic hallucination rates, while fast and effective hallucination evaluation models like Vectara’s HHEM model have become available to help enterprises prevent incorrect answers from reaching their application consumers.
→ Scale: As RAG adoption moved from PoC to production, it became increasingly clear that the “R” in RAG (i.e. Retrieval) is one of the biggest bottlenecks. In large enterprise production deployments, the grounding dataset is often quite large, and retrieving the most useful facts for the generation step becomes a challenge. More mature enterprise RAG platforms now integrate flexible and robust search techniques like hybrid search and reranking to address these scale challenges.
Top 7 predictions for RAG in 2025
So where does enterprise RAG evolve from here? Here are our predictions for 2025:
- RAG platforms will become de facto when approaching DIY Enterprises have realized the costs and risks associated with going down the DIY path alone and want to avoid wasting sparse resources stitching all the required components together themselves. Instead, they will turn to mature enterprise RAG platform vendors and implementation partners to support the journey.
- The fight against hallucinations continuesFor RAG in general but with more intense focus as Agentic AI is on the rise (as mistakes will have more significant ripple effects in a chain of actions), it will become even more important to address hallucinations. More innovation will hence emerge around building frontier models that hallucinate less by design, as well as helper models analyzing input or output.
- The quality of RAG responses will continue to improve Better data parsing and pre-processing techniques will emerge, along with better query intelligence mechanisms and better helper-models to align output with the expectations, values, and goals. At the same time, we expect the industry will gain a better understanding of the quality/cost tradeoff for techniques that promise improved quality, e.g., Graph RAG, Contextual Retrieval, etc.
- Agentic RAG will be the new top-of-mind topic, and the next wave of hype. Any vendor in this market will promise widely and loudly the ability to enable new levels of efficiency and more complex workflows. We predict, however, that the adoption of Agentic RAG will have its own trajectory and require much more careful conversations in 2025. Mistakes in an Agentic chain will have a more detrimental negative impact, which makes us believe enterprises will approach Agentic with even more caution than Assist AI. In 2025, we will therefore see more basic AI agents taking form for very domain-specific and easily grounded or non-detremental information workflows. For example, information retrieval from specific tools, parsing of legal documents, updating fields in SaaS tool entries etc. This simple form of Agentic may ramp up significantly towards the second half of the year, while the complex (and real ROI impacting) agentic workflows will have a slower pace to adoption (2026/2027 and beyond).
- Evaluation techniques and frameworks will rapidly emerge as enterprises continue to struggle to define and measure ROI and decide on the right model or RAG system for the right use case. These frameworks will be essential to provide the required confidence for some use cases, especially in regulated industries, to continue to accelerate the adoption of AI assistants and agents. Evaluation frameworks that span end to end will be very important to increase understanding and transparency across more complex multi-agent systems.
- Multi-modal RAG and helper-models will become the norm RAG systems will be able to natively support more data types, e.g., images, audio, and video.
- Multi-model trending. More models working together will continue to trend. We’ve already seen general “critics” or additional reasoning models released in different forms by the hyper scalers. Recently more ideas on guardrails and side models helping with common pitfalls with GenAI have been published. We will see this kind of mitigation models and ideas realized in 2025. We will also see more domain-specific “helper models” - think critics with expertise in legal, healthcare, chemical processing subject matters etc, show up on the map.
Conclusions
It's clear that Gen AI is moving beyond its experimental phase, and the reliability of RAG is a big part of that. Success in 2025 will depend on choosing the right partners, utilizing robust evaluation frameworks, and taking a measured approach to emerging trends, such as Agentic AI. Organizations that thoughtfully navigate these challenges while maintaining focus on security, accuracy, and scalability will be best positioned to capture the transformative potential of this technology - beyond internal and incremental.
We at Vectara wake up every day with the mission to help make the enterprise journey with Generative AI a safe and comfortable experience. We want to be the trusted advisor to enterprises along this journey because we had a complex and storied journey with RAG ourselves. We continue to invest heavily in improving RAG quality and explainability with enterprise empathy in mind. This translates to features such as deployment model flexibility, hallucination detection and prevention, fast and accurate retrieval, intelligent query filtering, access controls, and many other capabilities that make it possible for Enterprises to do RAG responsibly, over their data, in your environment and always get the best result. This is why our customers select us as a trusted partner in the adoption of Gen AI. We are ready to help you realize the ROI of your unique AI assist and Agentic applications in 2025!
Ready to start your enterprise RAG journey?
To learn more, you can sign up for Vectara’s free 30 day trial and explore the powerful features of our platform or contact us for an educational workshop for your enterprise team.
Also, don’t miss the vectara-agentic Python library, which provides a simple-to-use approach to safely building AI assistants and agents using responsible enterprise RAG.
Thank you for reading our predictions. We would like to hear your thoughts! Perhaps you’d like to join our growing community of enterprise developers and architects. If so, you can start by joining the discussion in our discussion forums or Discord server.