Agentic RAG
See the architecture and components LlamaIndex used to build an agentic RAG system for customer support. Code snippets and notebook included.
AI agent orchestration
AI agents get instructions from human users or other programs. After setting the goal, they determine which steps to take (like calling tools or databases). Tools like LangGraph or AutoGen can be used to create and call agents which abstract lower-level details. Or you can write your own code to set up agents and connect them with the data connections and tools needed to accomplish tasks.
AI agent orchestration
AI models
Agents often call multiple models depending on the task. To optimize speed and cost, you can call smaller, faster models for simple tasks and use more advanced models when necessary.
Semantic caching
To make responses faster and save cost from AI inferencing, AI apps and agents can use semantic caching to store the results from the LLM for easy access. This helps for use cases with redundant calls, like for customer support agents where many users ask similar questions like “How do I reset my password?”
Tool calls
Agents can interact with multiple tools and decide which tool is best for a particular task. They can search the internet, call other internal tools, or write queries to search databases for specific info.
Tool calls
SEMANTIC CACHING
AI model
Agent memory (short-term and long-term)
While completing tasks, agents store short-term information for the duration of the tasks (like user input and results of tool calls), so it’s available for fast retrieval and can be leveraged by future steps. Long-term memory stores persistent information that can be retained and reused across multiple tasks, sessions, or interactions. This memory accumulates and retains knowledge over time. This helps maintain a coherent understanding of the user’s preferences, past queries, or evolving objectives across sessions.
Data sources
To interact with existing information, AI agents connect to one or more databases to get the info they need to make decisions and provide accurate responses. Like any other app, agents do this through APIs. They can be trained to interact intelligently with APIs to get the data required, which can include generating queries.
Embedding models
A common technique to identify relevant info is Retrieval Augmented Generation or RAG. For RAG, structured and unstructured data is converted to a vector embedding that captures the semantic meaning of that data to return to the agent.
Vector database
Vector embeddings of available knowledge bases or context are stored in databases that support vectors and vector search, which many databases have recently added support for because of their usefulness for GenAI.
Agent memory (short-term and long-term)
Data sources
Embedding model
Vector database
Hosting & AI chips
AI apps can be deployed into production using all the major cloud providers, on-prem solutions, and hybrid solutions. They use hardware like GPUs, TPUs, and CPUs to process tasks and meet their demand for computing capacity.
LLMOps, authorization, & dev tools
After execution, there are supporting frameworks and platforms you’ll need to make sure your data flows properly and agents can be fixed if something goes wrong. There are also agent builder tools that let you design and build AI agents with little or no code. See demos and resources here.
Hosting
AI chips
LLMOps
Auth
Dev tools
Consider latency
AI agents serve various roles, from scrubbing and annotating data to answering real-time questions for users. For agents that interact with humans or streaming data, the components need to be fast. For background or asynchronous processes, slower responses may work. But as agentic systems get more complex, latency adds up, and many designs try to optimize for real-time from the initial design rather than trying to reduce latency later.
Plan for future innovations
Innovations are constant in GenAI, and you want tools with ecosystem support. To adapt to the new tools, frameworks, and models constantly coming out, build your agent architecture in a way that takes advantage of the latest models, or add in new data sources or tools like logging or visibility.
Build for scale
You may want to build a prototype to establish proof of concept. As you move your prototype to production, make sure your agentic systems can handle messy production data, work for many users at the same time without slow-downs, and stand up to security considerations like authorization protocols and DDoS attacks.
Learn more about how to use AI agents in your
app by talking to a Redis expert.