All eyes on AI: 2026 predictions – The shifts that will shape your stack.

Read now

Tutorial

What is Agent Memory? Example using LangGraph and Redis

February 26, 202651 minute read
TL;DR:
Agent memory is the mechanism that allows AI agents to retain and recall information across interactions. Short-term memory holds the current conversation, while long-term memory persists user preferences and knowledge between sessions. Redis provides the speed and data-structure flexibility to power both types of memory, with sub-millisecond reads for real-time agent workflows.
This notebook demonstrates how to manage short-term and long-term agent memory using LangGraph and Redis. We'll explore:
  1. Short-term memory management using LangGraph's checkpointer
  2. Long-term memory storage and retrieval using RedisVL
  3. Managing long-term memory manually vs. exposing tool access (AKA function-calling)
  4. Managing conversation history size with summarization
  5. Memory consolidation

#What you'll learn

  • What AI agent memory is and why it matters for stateful workflows
  • The difference between short-term, long-term, episodic, and semantic memory
  • How to use the LangGraph Redis checkpointer for persistent conversation state
  • How to store and retrieve long-term memories with vector search using RedisVL
  • Two strategies for managing memory: manual extraction vs. LLM tool-calling
  • How to consolidate and summarize memories to keep context windows efficient

#What we'll build

We're going to build two versions of a travel agent, one that manages long-term memory manually and one that does so using tools the LLM calls.
Here are two diagrams showing the components used in both agents:
Architecture diagram of AI agent memory showing short-term conversation history and long-term episodic and semantic memory stored in Redis

#Setup

#Required API keys

You must add an OpenAI API key with billing information for this lesson. You will also need a Tavily API key. Tavily API keys come with free credits at the time of this writing.

#Run Redis

#For Colab

Convert the following cell to Python to run it in Colab.

#For alternative environments

There are many ways to get the necessary redis instance running
  1. On cloud, deploy a FREE instance of Redis in the cloud. Or, if you have your own version of Redis Software running, that works too!
  2. Per OS, see the docs
With docker: docker run -d --name redis -p 6379:6379 redis:latest

#Test connection to Redis

#What types of agent memory exist?

AI agents typically use multiple types of memory, each serving a different purpose. This agent uses short-term memory and long-term memory. The implementations of short-term and long-term memory differ, as does how the agent uses them. Let's dig into the details. We'll return to code soon.

#Short-term memory

For short-term memory, the agent keeps track of conversation history with Redis. Because this is a LangGraph agent, we use the RedisSaver class to achieve this. RedisSaver is what LangGraph refers to as a checkpointer. You can read more about checkpointers in the LangGraph documentation. In short, they store state for each node in the graph, which for this agent includes conversation history.
Here's a diagram showing how the agent uses Redis for short-term memory. Each node in the graph (Retrieve Users, Respond, Summarize Conversation) persists its "state" to Redis. The state object contains the agent's message conversation history for the current thread.
LangGraph agent nodes persisting conversation state to Redis using the RedisSaver checkpointer for short-term memory
If Redis persistence is on, then Redis will persist short-term memory to disk. This means if you quit the agent and return with the same thread ID and user ID, you'll resume the same conversation.
Conversation histories can grow long and pollute an LLM's context window. To manage this, after every "turn" of a conversation, the agent summarizes messages when the conversation grows past a configurable threshold. Checkpointers do not do this by default, so we've created a node in the graph for summarization.
NOTE: We'll see example code for the summarization node later in this notebook.

#Long-term memory

Aside from conversation history, the agent stores long-term memories in a search index in Redis, using RedisVL. Here's a diagram showing the components involved:
Long-term agent memory architecture showing episodic and semantic memories stored and retrieved via vector search in Redis using RedisVL
The agent tracks two types of long-term memories:
  • Episodic: User-specific experiences and preferences
  • Semantic: General knowledge about travel destinations and requirements
NOTE: If you're familiar with the CoALA paper, the terms "episodic" and "semantic" here map to the same concepts in the paper. CoALA discusses a third type of memory, procedural. In our example, we consider logic encoded in Python in the agent codebase to be its procedural memory.

#Representing long-term memory in python

We use a couple of Pydantic models to represent long-term memories, both before and after they're stored in Redis:
We'll return to these models soon to see them in action.

#Short-term memory storage and retrieval

The RedisSaver class handles the basics of short-term memory storage for us, so we don't need to do anything here.

#How does Redis store agent memory?

Redis stores agent memory in two ways. For short-term memory (conversation history), the LangGraph Redis checkpointer serializes graph state into Redis data structures, giving you sub-millisecond reads and writes for every conversation turn. For long-term memory, we use RedisVL to store memories as JSON documents with vector embeddings, enabling semantic search of past experiences and knowledge.
Let's set up a new search index to store and query memories:

#Storage and retrieval functions

Now that we have a search index in Redis, we can write functions to store and retrieve memories. We can use RedisVL to write these.
First, we'll write a utility function to check if a memory similar to a given memory already exists in the index. Later, we can use this to avoid storing duplicate memories.

#Checking for similar memories

#Storing and retrieving long-term memories

We'll use the similar_memory_exists() function when we store memories:
And now that we're storing memories, we can retrieve them:

#Managing long-term memory manually vs. calling tools

While making LLM queries, agents can store and retrieve relevant long-term memories in one of two ways (and more, but these are the two we'll discuss):
  1. Expose memory retrieval and storage as "tools" that the LLM can decide to call contextually.
  2. Manually augment prompts with relevant memories, and manually extract and store relevant memories.
These approaches both have tradeoffs.
Tool-calling leaves the decision to store a memory or find relevant memories up to the LLM. This can add latency to requests. It will generally result in fewer calls to Redis but will also sometimes miss out on retrieving potentially relevant context and/or extracting relevant memories from a conversation.
Manual memory management will result in more calls to Redis but will produce fewer round-trip LLM requests, reducing latency. Manually extracting memories will generally extract more memories than tool calls, which will store more data in Redis and should result in more context added to LLM requests. More context means more contextual awareness but also higher token spend.
You can test both approaches with this agent by changing the memory_strategy variable.

#Managing memory manually

With the manual memory management strategy, we're going to extract memories after every interaction between the user and the agent. We're then going to retrieve those memories during future interactions before we send the query.

#Extracting memories

We'll call this extract_memories function manually after each interaction:
We'll use this function in a background thread. We'll start the thread in manual memory mode but not in tool mode, and we'll run it as a worker that pulls message histories from a Queue to process:

#Augmenting queries with relevant memories

For every user interaction with the agent, we'll query for relevant memories and add them to the LLM prompt with retrieve_relevant_memories().
NOTE: We only run this node in the "manual" memory management strategy. If using "tools," the LLM will decide when to retrieve memories.
This is the first function we've seen that represents a node in the LangGraph graph we'll build. As a node representation, this function receives a state object containing the runtime state of the graph, which is where conversation history resides. Its config parameter contains data like the user and thread IDs.
This will be the starting node in the graph we'll assemble later. When a user invokes the graph with a message, the first thing we'll do (when using the "manual" memory strategy) is augment that message with potentially related memories.

#Defining tools

Now that we have our storage functions defined, we can create tools. We'll need these to set up our agent in a moment. These tools will only be used when the agent is operating in "tools" memory management mode.

#Creating the agent

Because we're using different LLM objects configured for different purposes and a prebuilt ReAct agent, we need a node that invokes the agent and returns the response. But before we can invoke the agent, we need to set it up. This will involve defining the tools the agent will need.

#Responding to the user

Now we can write our node that invokes the agent and responds to the user:

#Summarizing conversation history

We've been focusing on long-term memory, but let's bounce back to short-term memory for a moment. With RedisSaver, LangGraph will manage our message history automatically. Still, the message history will continue to grow indefinitely, until it overwhelms the LLM's token context window.
To solve this problem, we'll add a node to the graph that summarizes the conversation if it's grown past a threshold.

#Assembling the graph

It's time to assemble our graph.

#Consolidating memories in a background thread

We're almost ready to create the main loop that runs our graph. First, though, let's create a worker that consolidates similar memories on a regular schedule, using semantic search. We'll run the worker in a background thread later, in the main loop.

#The main loop

Now we can put everything together and run the main loop.
Running this cell should ask for your OpenAI and Tavily keys, then a username and thread ID. You'll enter a loop in which you can enter queries and see responses from the agent printed below the following cell.

#Next steps

Now that you understand how to implement persistent agent memory with LangGraph and Redis, here are some ways to go further: