Blog
Knowledge graph retrieval-augmented generation (RAG): structured retrieval for AI agents
A user asks your support agent: "is the slow-sync bug from my last ticket fixed in the version you told me to upgrade to?" Answering means connecting three documents: the customer's earlier ticket, the new release notes, and the engineering issue the fix was tied to. Your vector search returns ten chunks that all sound relevant, but none of them answer the question. If you've built a retrieval-augmented generation (RAG) pipeline, you've probably watched this happen: the answer is spread across separate documents, and nearest-neighbor search hands the model semantically similar text without ever connecting those facts.
That disconnect is the problem knowledge graph RAG addresses. Vector search ranks chunks by how similar they sound; it has no notion of how the facts relate. Knowledge graph RAG models your data as entities and the relationships between them, so an agent can follow a chain of connections instead of hoping similarity scores line up. This guide covers what knowledge graph RAG is, where vector search falls short, why agents need multi-hop retrieval, the freshness problem that breaks stale graphs, and where structured retrieval fits in an agent's context path.
What knowledge graph RAG is & how it differs from vector RAG
RAG improves LLM responses by pulling relevant context from your own data before the model generates an answer. The difference between vector RAG and knowledge graph RAG is how they retrieve that context: vector RAG matches on semantic similarity, while knowledge graph RAG matches on structure, retrieving connected entities and relationships rather than isolated text chunks.
Vector RAG splits documents into chunks, embeds each chunk as a high-dimensional vector, and retrieves the top-k nearest neighbors by similarity. That captures semantic meaning but drops structure. It doesn't represent that "Jane Smith" and "J. Smith" are the same person, or that a 2025 policy supersedes the 2023 version.
Knowledge graph RAG, often called GraphRAG, takes a different path. It uses LLMs to build knowledge graphs from unstructured text, then uses those graphs to guide retrieval. The index unit isn't a chunk embedding; it's a set of nodes (entities) and edges (the relationships between them).
A common knowledge graph RAG pattern runs in four steps:
- Data extraction: an LLM pulls entities, relationships, and metadata from your source data.
- Query entity linking: the app extracts key entities from the query and uses vector search to find the matching nodes.
- Graph traversal: the app generates graph queries, for example in Cypher, to walk the relationships tied to the question.
- Response generation: the retrieved graph context augments the prompt.
The split matters: the app orchestrates extraction and traversal, the graph store holds entities and edges, and the model reasons over what it retrieves.
Redis Iris serves agent context in milliseconds
Redis Iris connects memory, live data, and retrieval in one place.The questions vector search struggles to answer
Vector search is great at one thing: finding text that means something similar to your query. That strength becomes a limit on three kinds of questions.
The first is relationship-path queries, like the support question above. The answer is split across documents that connect through relationships, not wording. Vector search can surface the ticket, the release notes, and the issue individually, but it can't follow the "reported in" and "fixed by" links that turn three documents into one answer.
The second is whole-corpus questions, like "what are the top 5 themes in our support tickets?" The answer is a property of the entire collection, so no single passage is "similar" to it. Nearest-neighbor search only sees the chunks closest to the query, so it can't reason about the shape of the whole corpus.
The third is dense, interconnected domains: code dependencies, org charts, citation networks. Here everything looks similar to everything else, so vector search returns plausible-but-wrong neighbors and floods the context window with near-misses. Retrieving more documents only adds noise, because the system treats each one as an isolated block, not a node in a network.
The common thread: similarity measures how alike two pieces of text are, not how they relate. The first pattern is the one agents hit most, and it has a name: multi-hop retrieval.
Why agents need multi-hop retrieval, not just nearest-neighbor matches
Multi-hop retrieval means assembling an answer from evidence found in sequence, where each step tells you what to look for next. The support question is a two-hop chain: find the bug in the ticket, then find the release that fixed it. Standard RAG can't do this. It retrieves once against the original query and stops, grabbing documents that resemble the question but never following the trail to the bridging fact, so the chain breaks before the answer forms.
Graph traversal closes that gap by making relationships first-class. A knowledge graph stores facts as triples, two entities joined by a relation, so the agent can answer a chained question by following those relations step by step: from the bug node, across the "fixed by" edge, to the release node. Some frameworks also break a complex question into focused sub-questions, then retrieve only the nodes and edges each one needs.
This pays off twice. First, accuracy: across multi-hop datasets like MuSiQue, 2WikiMultiHopQA, and HotpotQA, one graph-based method reported 36.25% higher accuracy than a dense retriever baseline. Second, fewer hallucinations: fetching only the evidence each hop needs avoids dumping loosely-related text into the prompt, and excessive retrieval tends to increase the likelihood of hallucination. Explicit relationships also add resilience against poisoned content that scores high on similarity but carries the wrong answer.
Build agents that remember, not agents that guess
Redis Iris gives every agent fresh context and long-term memory.Stale graphs break agents: the freshness problem batch indexing ignores
A multi-hop graph is only as good as how current it is, and that's where many knowledge graph RAG deployments quietly fall apart. Batch indexing leaves graphs stale, the failures don't surface in error logs, and agents reason over outdated facts.
The freshness problem tends to fail silently because semantic similarity doesn't prioritize time: a vector embedding of a deprecated API reference can score as highly as a current one. Staleness windows scale with your indexing architecture, so a nightly batch can leave your graph hours behind, and that window widens as document volume and churn grow.
Graphs make this harder. Standard GraphRAG often assumes a static corpus, which makes updates costly by design. Incremental updates exist, but adding new content can alter community structure enough that much of the index gets recomputed anyway, and those approaches tend to be lossy and still require periodic full re-indexing.
The resulting failure modes are easy to picture:
- Stale relationships: the graph still links an account to its owner six months after that employee left the company.
- Entity decay: facts that were true when the graph was built but are no longer accurate.
- Version blindness: on version-sensitive questions, standard GraphRAG reached only 64% accuracy, barely ahead of naive RAG, because it lacks explicit version-to-version connections.
For live agents, stale data isn't just an annoyance. In fraud detection or content moderation, delayed updates can let malicious activity slip through before the system catches up. And when one agent updates a shared fact while another reads a stale version, you face consistency problems across your whole agent fleet.
The fix leans on change data capture and incremental, near-real-time updates rather than periodic rebuilds, keeping entities fresh as the data changes.
Where structured retrieval fits in an agent's context path
Structured retrieval keeps the agent's context window focused on what it needs for the next step. This is the discipline of context engineering: filling the context window with the right information at the right time. Think of the LLM as a CPU and the context window as RAM, and your job is OS-level memory management.
Stuffing the window backfires. As context grows, models can miss relevant content buried in the middle of a long context rather than at the start or end. Teams call this decay context rot: more history in, worse recall of the details that matter.
The cleaner answer is to give each kind of context its own home. The agent memory stack typically spans three tiers:
- Short-term working memory: the immediate conversation and current task, living in the context window and expiring with the session.
- Long-term memory: episodic histories and semantic facts, often split between vector stores for similarity search and structured stores for the agent's world model.
- Entity relationships and multi-hop queries: the domain modeled as a graph, where user profiles and active states map to low-latency stores and relationships map to knowledge graphs.
Each tier feeds the next step with only what it needs. The schema is one of the most consequential calls in the pipeline: modeling the domain as a graph pre-compiles the reasoning paths the agent needs, instead of forcing a broad vector search every time, and preserves how facts connect across sessions.
Security matters here too. Letting agents hit the database directly through text-to-SQL or bulk OpenAPI-to-Model Context Protocol (MCP) conversion breaks down at scale: generated SQL gets unreliable as schemas grow, and converting hundreds of endpoints into hundreds of tools burns context. The official GitHub MCP server alone exposes 90+ MCP tools consuming over 46,000 tokens.
How live entities & relationships support agent retrieval
Governed access, fresh data, and multi-hop retrieval all converge in Redis Iris, the context engine. It gives agents a single retrieval layer for all four jobs: vector search, structured entity retrieval, durable memory, and semantic caching. Keeping all four in one in-memory store means an agent isn't making a network hop to a different system at each step. That matters because agents make far more retrieval calls than human users, so latency compounds fast. In a 20-node Amazon Web Services (AWS) cluster benchmark, enterprise-grade Redis reported sub-millisecond latency.
The Redis Query Engine supports vector search, full-text search, hybrid retrieval, and metadata filtering. In a billion-vector benchmark, Redis reported 90% precision at roughly 200ms median latency including round-trip time, retrieving the top 100 neighbors under 50 concurrent queries. Against other vector databases at the same recall levels, Redis benchmarks reported up to 3.4x higher throughput and 4.7x lower latency.
On the freshness side, Redis Data Integration uses change data capture to sync data from relational databases, warehouses, and document stores into Redis in near real time, not the periodic rebuild that leaves batch-indexed graphs stale.
For the access layer, Redis Context Retriever (in preview) lets devs define a semantic model with Pydantic, then auto-generates MCP tools that agents call instead of querying databases directly. Agents authenticate with scoped keys, discover only permitted tools, and run indexed lookups with row-level filters enforced server-side. One wealth-advisor implementation generated 25 MCP tools, keeping the tool surface small enough to avoid context bloat.
The memory tiers map to the same platform. Redis Agent Memory (in preview) holds short-term conversational state and longer-term durable memory, so agents carry context across turns and sessions without re-deriving it.
Semantic caching rounds it out. It uses vector embeddings to recognize when queries mean the same thing despite different wording, so repeated intents serve cached responses instead of fresh LLM calls. Redis LangCache reported up to 73% lower LLM inference costs without code changes.
Fresh context, every call
Redis Iris keeps agent data current so answers stay accurate.Structured retrieval is agent infrastructure, not a feature
Agents reason over what you retrieve, and similarity alone often misses the connected facts complex questions need. Knowledge graph RAG models entities and relationships explicitly, so an agent can chain evidence across sources instead of grabbing the nearest match and guessing. But the structure only helps if it stays fresh, fits the latency budget, and surfaces the right context per step.
That's the layer Redis is built for. As a memory-first data platform, it combines vector search, near-real-time entity updates, governed tool access, and semantic caching, so you can run live entity and relationship retrieval without stitching together a separate vector database, cache, and access layer.
If you're building agents that reason over connected, changing data, treat structured retrieval as core infrastructure. Try Redis Iris to see how it handles your retrieval workload, or talk to our team about fitting it into your agent stack.
Get started with Redis today
Speak to a Redis expert and learn more about enterprise-grade Redis today.
