dot Redis 8 is here—and it's open source

Learn more

Build smarter AI agents: Manage short-term and long-term memory with Redis

AI agents are systems capable of pursuing and achieving goals by harnessing the reasoning capabilities of large language models (LLMs) to plan, observe, and execute actions. In 2025, AI agents are anticipated to drive transformative changes across the workforce, impacting productivity and efficiency in significant ways, including supporting employees in their daily work, deploying digital humans for critical business functions, and even replacing enterprise SaaS as we know it

Building agentic systems is still an evolving field, and it comes with challenges that researchers and industry experts are actively working to solve. One key challenge is developing models specialized in reasoning tasks, as opposed to the language-focused tasks like summarization that characterized the first wave of GenAI apps. Another critical hurdle is managing the memory of AI agents, which often requires adopting sophisticated methodologies to achieve the desired level of agentic performance.

Memory is key to making AI agents work. This guide covers why it matters, the different types, best practices for managing it, and why Redis stands out as the ideal data platform for agentic memory. We’ll also cover practical implementation to help you integrate agentic memory effectively.

Why does memory matter?

AI agent memory is crucial for enhancing efficiency and capabilities because Large Language Models (LLMs) do not inherently remember things i.e., they are stateless. Memory allows AI agents to learn from past interactions, retain information, and maintain context, leading to more coherent and personalized responses. 

Imagine an AI agent designed to plan and book work trips. Without memory, it won’t remember personal preferences (e.g., “Do you like direct flights or flights with layovers?”); make procedural mistakes due to lack of understanding (e.g., booking a hotel that does not offer the amenities required for business trips, like meeting rooms or reliable Wi-Fi); and fail to recall previously provided details like passport information. This leads to a frustrating user experience with repetitive questions, inconsistent behaviour, and a lack of personalisation.

Short-term vs long-term memory

AI agents, like humans, rely on both short-term and long-term memory to function effectively.

Short-term memory works like a computer’s RAM—holding onto relevant details for an ongoing task or conversation. This working memory exists only briefly within a conversation thread and is usually limited due to the constrained context windows of large language models (LLMs) or the need to minimize less relevant information. That’s where agentic frameworks like LangGraph come in. Agentic frameworks like LangGraph simplify short-term memory management by providing tools like Checkpointers, which help maintain thread-specific context. This allows agents to store short-term memory efficiently in high-performance databases like Redis.

Long-term memory works more like a hard drive, storing vast amounts of information to be accessed later. This is information that persists across multiple task runs or conversations, allowing agents to learn from feedback and adapt to user preferences. These memories can further be divided into three types (for more information on the nuances of different memory types, we suggest reviewing the famous CoALA framework paper):

  • Episodic memory: Stores specific past events and experiences, like a personal diary of the AI’s interactions. For example, an AI might remember that a user previously booked a trip to London for a conference and prefers to stay in city-centers. 
  • Procedural memory: Stores learned skills, procedures, and “how-to” knowledge, forming the AI’s repertoire of actions. For instance, an AI could learn the optimal process for booking flights such as ensuring proper layover times between connecting flights. 
  • Semantic memory: Stores general knowledge, facts, concepts, and relationships, composing the AI’s knowledge base about the world. For example, an AI could store information about visa requirements, popular tourist destinations, or average hotel costs.

Managing long-term memory is complex due to challenges like deciding which type of memories to store, figuring out what to store, how to decay older memories and how to retrieve them effectively into working memory.

Key architectural decisions for managing long-term memory

There are four highest-level decisions you need to make when planning your memory management architecture:

  1. Which type of memories to store? 

The type of memories you need to store and manage may depend on the type of application. For example, a conversational AI agent would be expected to remember information across threads about user preferences (and therefore store episodic memory).  On the other hand a retail AI assistant may be required to store information about products and recall relevant facts from a product knowledge database (and therefore store semantic memory). 

  1. How to store and update memories? 

Given constraints in LLM context windows and the risk of context pollution, it’s critical to efficiently store memories. There are four common strategies we see developers use to efficiently store relevant memories. For most production deployments we expect AI Agents to use a combination of these techniques (note these techniques are not mutually exclusive and many developers may want to use a combination of these).  

  • Summarization: By far the simplest approach is to simply summarize previous conversations (usually using an LLM). The memory module incrementally summarizes conversations updating and refining a summary as new data or experiences are added. The summarized conversations can then be stored as strings in Redis and retrieved to contextualize future queries to the LLM. An example of this implementation was implemented by the Motorhead team in their Open source project.  
  • Vectorization: Vectorization lies at the heart of modern AI memory management. It transforms textual information into numerical representations that encapsulate the underlying meaning of words and concepts. By segmenting memories into discrete chunks—we recommend using semantic chunking—and vectorizing them, developers can leverage vector search to retrieve the most relevant memories with precision and efficiency. 
  • Extraction: An emerging alternative to summarization or simple chunking of memories is to extract key facts from conversation history and storing them in an external database with context about the facts. A document store like RedisJSON provides a perfect solution for storing these. The Langchain team recently shared an example of a memory agent that extracts and writes memories. 
  • Graphication: Another approach that is sometimes used is to store AI agent memories by mapping information as interconnected entities and relationships. This structured format enables dynamic, context-rich memory storage. 
  1. How to retrieve relevant memories? 

Imagine you have memory chunks stored in a database like Redis, along with their embeddings and text descriptions. How does the Agent know how to retrieve the most relevant memories? This is an emerging area of research with some sophisticated approaches being tried by researchers. For example, the MemGPT paper takes an approach of using the LLM as a query generator where it can make decisions about when to retrieve long-term memory, generating a query to do a search (by generating function calling tokens) and then using vector search to retrieve relevant chunks. For most applications, we recommend developers start with a vector search of the memory database and add on additional sophistications from there as needed. 

  1. How to decay memories? 

It’s crucial to decay stored memories in AI systems to prevent memory bloat and maintain efficiency. As an AI agent interacts over time, it accumulates a massive amount of information, some of which becomes irrelevant or outdated. Without a mechanism for forgetting, the AI’s memory would become overwhelmed with useless data, leading to slower retrieval times, decreased accuracy in responses, and inefficient use of resources. If storing memories using Redis, you can use the various built-in eviction and expiration strategies to efficiently manage memory decay. You can also add timestamps as another field in the object and influence the final search result with some notion of recency sorting.

What makes Redis the right choice for handling long-term memory?

There are several reasons why developers prefer Redis as their platform to store and manage AI agent memories. These include: 

  • Fast Performance:  Reading and writing memories is on the “hot-path” of your application flow. Slow retrieval times can significantly impact the user experience or force developers to make trade-offs against optimal performance.  The in-memory architecture of Redis ensures microsecond-level read and write operations which is critical for this use case. 
  • Fastest and fully-featured vector search: Redis provides a native fully-featured vector database delivering the fastest benchmarked vector search solution on the market. Given the need to vectorize and do semantic search on memories this is a critical requirement while selecting your data platform. 
  • Integrated with your AI stack: Redis is fully integrated with popular AI frameworks including LangGraph, LlamaIndex and Autogen. In addition, developers can use RedisVL — a powerful, dedicated Python client library for using Redis for GenAI applications. RedisVL comes with built-in abstractions, including those for managing conversational memory. 
  • Scalability: When building agentic systems, predicting the storage requirements for AI agent memories, the number of clients needing access, and the frequency of data retrieval can be challenging. Redis offers a suite of features that facilitate large-scale deployment, including the ability to scale across multiple nodes, automatically tier less frequently accessed data to disk (using Redis Flex), and support for high availability and data persistence. Plus, Redis’ built-in eviction and expiration policies simplify memory decay, ensuring efficient handling of data over time.
  • Flexibility: Redis offers several data structure options out of the box, These data structures, like hash (for streamlined efficiency) or JSON (for nested documents), give developers the flexibility to do memory management how they prefer. 

We make managing memory simpler with our open source Redis Agent Memory Server.

Example of agent memory using LangGraph & Redis

This notebook demonstrates how to manage short-term and long-term agent memory using LangGraph and Redis. In it, we explore:

  1. Short-term memory management using LangGraph’s checkpointer
  2. Long-term memory storage and retrieval using RedisVL
  3. Managing long-term memory manually vs. exposing tool access (AKA function-calling)
  4. Managing conversation history size with summarization
  5. Memory consolidation

In the notebook, we build two versions of a travel agent, one that manages long-term memory manually and one that does so using tools the LLM calls.

Here are two diagrams showing the components used in both agents:

That’s a wrap. Let’s start building

Want to make your own agent? Try the LangGraph Quickstart. Then add our Redis checkpointer to give your agent fast, persistent memory. Redis Agent Memory Server is our open source tool for managing memory for agents and AI apps.

Using Redis to manage memory for your AI Agent lets you build a flexible and scalable system that can store and retrieve memories fast. Check out the resources below to start building with Redis today, or connect with our team to chat about AI Agents.

  • Redis Agent Memory Server: This repo manages both conversational context and long-term memories.
  • Redis AI resources: GitHub repo with code samples and notebooks to help you build AI apps. 
  • Redis AI docs: Quickstarts and tutorials to get you up and running fast.
  • Redis Cloud: The easiest way to deploy Redis—try it free on AWS, Azure, or GCP.