Blog
Sub-agents: splitting context across specialized AI agents
If you've ever watched a single AI agent lose the plot on a complex task, you've probably wondered whether splitting the work across multiple agents would help. It can. But the split comes with its own headaches. Agents lose track of each other's work, duplicate steps, and act on stale state. This article covers what sub-agents are, why teams break one agent into many, the context problems that force the split, and why shared memory is a piece many teams underestimate.
What sub-agents are & why teams split one agent into many
Sub-agents are specialized components that each handle a narrowly scoped, role-specific task inside a larger system. Each one typically uses a large language model as its reasoning core to perceive inputs, reason about them, and act within clearly defined operational boundaries. Instead of one generalist agent juggling everything, you get a team of focused agents.
A common way to wire this up is the sub-agent-as-tools pattern, where a main orchestrator delegates a task to a sub-agent through an explicit tool call. The orchestrator stays focused on high-level reasoning while a sub-agent does detailed work. Sub-agents can also invoke other sub-agents, so a big problem can break into smaller pieces.
A few coordination patterns come up most often, differing in how work gets routed and who stays in control:
- Orchestrator-worker: A central orchestrator splits a task into subtasks, assigns them to workers, and combines the results.
- Supervisor: A central supervisor takes the user's input and delegates each request to the sub-agent that owns that domain. It fits when domains are distinct, such as calendar, email, and customer relationship management (CRM), each with its own tools.
- Swarm: No central coordinator. Agents hand control to each other directly, one active at a time, so each needs awareness of the others.
- Router: A routing step classifies the input, routes it to the right agent, and combines the responses.
These patterns trade off control, parallelism, and complexity differently. A useful starting point: try a single agent first, then prompt engineering, then tools before adding agents. Graduate to multi-agent patterns only when you hit clear limits.
Redis Iris serves agent context in milliseconds
Redis Iris connects memory, live data, and retrieval in one place.The context window problem that forces the split
Teams split one agent into many because a single context window can't hold everything without breaking down on cost and quality. The context window is the amount of text, measured in tokens, that a model can consider at once. Modern models have stretched this a lot, but dumping everything into one giant prompt creates two problems.
Cost is the first. Self-attention computes scores between every pair of tokens, which means O(n²) computational complexity. Longer contexts get slower and pricier to run, and the bill adds up fast when an agent loops through many iterations.
Quality is the sneakier problem. Models tend to suffer from a "lost in the middle" effect, where performance peaks when relevant information sits at the start or end of the context but shows middle-context degradation for information buried in between. The degradation tracks input length even when the evidence sits in the best possible spots. On one benchmark, Claude 3.5 Sonnet accuracy dropped from 29% to 3% as context grew from 32K to 256K tokens. A bigger window doesn't guarantee better reasoning. It can do the opposite.
How specialized sub-agents keep each context window focused
Sub-agents help by giving each agent a clean, bounded context window instead of one shared window that fills with everything. The mechanism is delegation: a specialized sub-agent explores extensively, sifts through messy or irrelevant content, and returns only a condensed summary to the orchestrator. The detailed search context stays isolated inside the sub-agent while the lead agent focuses on synthesis, so the coordinator's window stays bounded even when the underlying work is sprawling.
This matters because context grows on its own. In a naive single-agent setup, the window climbs steadily across a multi-step task, since each step appends tool results to the message history. Splitting the work contains that growth. The payoff holds even when a sub-agent has the same capabilities as the main agent: you're not adding a smarter agent, you're giving a focused task its own clean room to work in.
The real cost of splitting: agents that lose the thread
Splitting context addresses one problem and creates another. When you fragment work across agents, they can lose track of what the others are doing. On sequential reasoning tasks, multi-agent configurations have been measured to degrade performance by 39 to 70 percent relative to single-agent baselines in one benchmark, with inter-agent misalignment a major culprit. The same study found multi-agent setups can help on parallelizable tasks, so the effect depends on task structure.
The bigger surprise is where the failures come from. A study of 1,600+ execution traces across seven frameworks traced most of them to specification and coordination problems, not to the underlying model. A smarter model doesn't fix a system that's wired to lose track of itself.
These coordination failures show up as familiar, frustrating behaviors. Agents repeat steps that were already done, lose conversation history through unexpected truncation, derail from the intended objective, or withhold information that other agents needed. Getting multi-agent coordination to feel coherent can take more context engineering than teams expect, partly because agents may assume they share state with their children when they don't.
Build agents that remember, not agents that guess
Redis Iris gives every agent fresh context and long-term memory.There's a cost angle too. Coordination overhead can make multi-agent systems pricier than a single-agent path, because agents often need extra turns to share context and reconcile partial results.
Why isolated context isn't enough: shared memory is one important piece
Isolating context keeps each window clean, but it doesn't give agents a common ground truth. That gap is where coordination breaks down. Inter-agent misalignment is a structural memory problem, not just a model quality one.
Picture a planning agent that decides to deprecate a module while the coding agent, never seeing that decision, rebuilds it from scratch. Without shared memory, the context payloads agents pass each other only grow on every turn, and decisions still go missing.
Without that common ground truth, the shared context itself degrades in four recognizable ways:
- Context poisoning: A hallucination or error makes it into the context and gets repeatedly referenced. In multi-step workflows, an early mistake hardens into an established fact and corrupts everything downstream.
- Context confusion: Superfluous information pushes the model toward a low-quality response. Tool-heavy setups make this worse, which is why OpenAI's API guidance recommends keeping the number of tools well below 20.
- Context clash: New information contradicts what's already in the prompt, often because term definitions conflict across documents or sub-agents.
- Context rot: Output quality measurably degrades as input length grows, even when the window isn't full.
Each of these compounds when no single agent can see the authoritative version of what's true. The bottleneck stops being raw compute and becomes keeping shared meaning consistent across agents, which is harder than it sounds: two agents can write conflicting summaries of the same facts, and resolving that takes more than a timestamp.
How shared memory & retrieval keep sub-agents coherent
Shared memory gives sub-agents a common place to read and write what matters, so decisions made by one agent don't vanish before another needs them. But memory alone isn't the whole answer. Agents also need retrieval, and the two do different jobs. Retrieval answers "what's in my data," while memory answers "what happened before," and a production agent often needs both answered in the same turn.
Both memory and retrieval work better when memory is organized in layers. Short-term, or thread-scoped, memory tracks the ongoing conversation within a session. Long-term memory stores user-specific or application-level data across sessions, so any thread can recall it. There are finer-grained types too, including episodic memory for recalling specific events and semantic memory for structured factual knowledge.
That long-term layer is where vector search comes in. Memories get broken into semantic chunks, each turned into a vector embedding, a numerical representation of meaning, so two phrasings of the same idea land close together in vector space. Those embeddings are stored with metadata like timestamps and user IDs, which lets hybrid search combine vector similarity with keyword matching and metadata filters. At retrieval time, the agent embeds its current context and pulls back only the relevant prior exchanges, keeping windows lean instead of concatenating an endless transcript.
Pulling the right memories back still needs guardrails, or shared memory becomes a free-for-all. Memory tools can partition what's stored by user, agent, session, and application, so each agent reads only what's relevant to its role and avoids context pollution while still sharing user-level context when it should. The primitives for this exist, but a standard protocol for read/write permissions across agents is still an open problem most teams solve ad hoc.
Designing for sub-agents without rebuilding your data layer
Shared memory raises a practical infrastructure question: do you need a separate store for short-term memory, another for long-term memory, another for vector search, and yet another for operational state? That path leads to vendor sprawl and sync headaches. Two stores means two latency profiles, two freshness windows, and two places for things to drift out of sync.
Redis Iris is a real-time context engine for AI workloads that keeps operational data in memory and supports capabilities like vector search and semantic caching. The pieces a sub-agent system needs can live on one engine instead of a tool zoo.
Redis covers the main memory functions in one place: short-term memory through in-memory data structures, long-term memory through vector search, operational state through hashes and JSON, and coordination through Redis Streams. In a 20-node Amazon Web Services (AWS) cluster benchmark, Redis reported over 100 million operations per second with sub-millisecond latency. On vector search, Redis reported 90% precision at ~200ms median latency in a billion-vector benchmark using 50 concurrent queries retrieving the top 100 nearest neighbors; latency included round-trip time.
Coordination between agents has a home too. Redis supports agent coordination through publish/subscribe (pub/sub) and Streams, where pub/sub decouples message senders from receivers and Streams captures durable, ordered event logs. The right pattern depends on how tightly your agents need to coordinate. The point is you can pick shared memory or local memory with explicit sync without standing up a new database for each choice.
It also helps that this slots into the frameworks you're probably already using. Redis offers integrations with 30+ agent frameworks including LangChain, LangGraph, and LlamaIndex, and the langgraph-checkpoint-redis package handles both thread-level session persistence and cross-thread long-term memory with vector search. You design for sub-agents at the application layer while the data layer stays consistent underneath.
Fresh context, every call
Redis Iris keeps agent data current so answers stay accurate.Give your sub-agents a shared memory layer
Shared memory and retrieval are what make splitting one agent into many actually work. The split is a reasonable answer to bounded context windows and degrading reasoning at scale, but the trade-off is coordination: agents that can't see a shared ground truth duplicate work, act on stale state, and let errors compound hop by hop. Isolated context windows keep each agent focused, but they don't keep the team coherent.
Keeping short-term memory, long-term memory, vector search, and coordination on one fast layer reduces drift between stores. The Redis you may already run for caching can do double duty as your agent memory layer.
If you're building multi-agent systems and feeling the coordination pain, it's worth seeing how this works with your own workload. Try Redis free to experiment with agent memory and vector search, or talk to our team about designing your context layer for sub-agents.
Get started with Redis today
Speak to a Redis expert and learn more about enterprise-grade Redis today.
