Your agents aren't failing. Their context is.

See how we fix it

Tutorial

How to build agent memory with Redis Agent Memory and LangGraph

May 18, 202630 minute read
Ricardo Ferreira
Ricardo Ferreira
William Johnston
William Johnston
TL;DR: In this tutorial, you will build a LangGraph travel agent that uses Redis Agent Memory for both short-term and long-term memory. Short-term session memory keeps the active conversation coherent, while long-term memory stores durable user facts and preferences across sessions.
Note: This tutorial uses the code from the following GitHub repository:
https://github.com/redis-developer/redis-agent-memory-with-langgraph-demo
Agent memory helps AI apps move beyond single-turn prompts. A useful agent should remember the current conversation, recall durable preferences, and avoid treating every detail as something it should store forever.
Redis fits this problem because memory retrieval needs to be fast and scoped correctly. In this demo, Redis Agent Memory handles session memory, long-term memory, and vector-backed recall behind a Python client API. You will build a small travel agent with Python, FastAPI, LangGraph, OpenAI, Docker, and the redis-agent-memory client.

#Memory model comparison

Memory layerBacked byLifetimeUsed forDemo UI panelKey code
Short-term memoryRedis Agent Memory session APIsOne session IDCurrent conversation continuityCurrent Sessionretrieve_session_context(), add_session_event()
Long-term memoryRedis Agent Memory long-term APIsAcross sessions for an owner and namespaceDurable facts and preferencesRetrieved Long-Term Memorysearch_long_term_memory()
Extracted memory candidatesStructured LLM output before writeOne turn unless acceptedProposed durable memoriesExtracted Long-Term MemoryMemoryExtraction, bulk_create_long_term_memories()
Transient task detailsKept in short-term memory, not extractedCurrent sessionActive itinerary details, dates, and requestsCurrent SessionExtraction prompt rules
Store durable preferences in long-term memory. Keep active task context in short-term memory. Filter and dedupe candidates before you write new durable memories.

#Prerequisites

Note: The demo does not start a local Redis instance or deploy Redis Agent Memory. It expects an existing Redis Agent Memory service endpoint.

#Setup

Clone the demo repo and create an environment file:
Set these values in .env:
VariableRequiredDescription
OPENAI_API_KEYYesAPI key used by the LangGraph agent.
AGENT_MEMORY_SERVER_URLYesRedis Agent Memory data-plane base URL obtained from Redis Cloud.
AGENT_MEMORY_STORE_IDYesStore ID used by the Redis Agent Memory API.
AGENT_MEMORY_API_KEYYesAPI key used by the Redis Agent Memory API.
OPENAI_MODELNoOpenAI model used for responses and memory extraction.
DEMO_OWNER_IDNoStable user identifier for long-term memories.
DEMO_NAMESPACENoLogical namespace for this demo's memories.
DEMO_AGENT_IDNoActor ID used when writing assistant session events.
Build and run the app:
Open http://localhost:8080.

#How the demo works

The web app has three parts:
  1. Nginx serves the static frontend.
  2. FastAPI handles /api/* routes.
  3. The backend creates a Redis Agent Memory client per request, then calls RedisAgentMemoryService.run_turn() to run one LangGraph workflow.
Each chat request returns the assistant response and the memory state needed by the UI:
The UI renders the assistant output plus three memory panels: current session memory, retrieved long-term memory, and newly extracted long-term memory.
Architecture diagram showing Nginx, FastAPI, LangGraph, OpenAI, and Redis Agent Memory working together

#Why Redis Agent Memory?

Redis Agent Memory gives the app one client interface for two different memory scopes. The app stays responsible for deciding what to retrieve, what to pass to the LLM, and what to write.

#Session-scoped memory

Session memory stores events for one session_id. The demo uses get_session_memory() to read previous events, add_session_event() to append the latest user and assistant messages, and delete_session_memory() to clear the current session.
This gives the agent continuity without making the frontend resend the full chat history on every request.

#Durable long-term memory

Long-term memory stores owner- and namespace-scoped facts and preferences. The demo uses search_long_term_memory() to recall relevant memories before the LLM call and bulk_create_long_term_memories() to write accepted new memories after the LLM responds.
The app does not need to manage the Redis data structures or vector search implementation directly. Redis Agent Memory exposes memory operations at the agent layer.

#App-level control

The demo uses Redis Agent Memory session APIs rather than LangGraph's native checkpointer. LangGraph organizes the turn, while Redis Agent Memory stores and retrieves the memory.
That split gives the app clear control over:
  • Which session events go into the prompt.
  • Which long-term memories match the current user request.
  • Which extracted facts deserve durable storage.
  • Which owner and namespace isolate memory for this user and app.

#Trade-offs to keep in mind

  • Redis Agent Memory must be reachable from the backend.
  • LLM-based extraction can vary by model and phrasing.
  • Bad owner IDs or namespaces can mix users' memories.
  • Long-term memory should store durable preferences, not every conversational detail.

#1. Configure the FastAPI app and memory client

#How it works

The backend loads configuration once, builds a service around that configuration, and opens a Redis Agent Memory client inside each API request that needs memory.

#Data flow

  1. load_config() reads .env and environment variables.
  2. get_service() creates and caches RedisAgentMemoryService.
  3. agent_memory_client() builds the Redis Agent Memory client from the cached config.
  4. FastAPI routes use the client inside a context manager.

#Code walkthrough

The full implementation lives in backend/memory.py and backend/app.py.
The backend exposes these endpoints:
EndpointPurpose
POST /api/sessionsStart a new session.
POST /api/chatRun one agent turn.
GET /api/sessions/{id}/memoryRead current session short-term memory.
DELETE /api/sessions/{id}/memoryDelete current session short-term memory.
GET /api/healthCheck backend liveness.
GET /api/readyCheck backend readiness, including Redis Agent Memory.
The readiness endpoint calls Redis Agent Memory before it reports success:
Key details:
  1. Required variables fail fast. Missing Redis Agent Memory credentials raise a runtime error during config loading.
  2. The service is cached. FastAPI creates one service with stable config instead of rebuilding the LLM wrappers for every route.
  3. The client is request-scoped. Each route opens the Redis Agent Memory client with with agent_memory_client(service).
  4. Readiness checks the dependency. /api/ready verifies Redis Agent Memory availability before the frontend treats the backend as ready.

#Trade-offs

  • Simple request-scoped client usage is easy to reason about.
  • Readiness depends on external Redis Agent Memory availability.

#2. Retrieve short-term memory for the current session

#How it works

Short-term memory is session-scoped. The backend reads memory by session_id, converts events into prompt-friendly lines, and returns an empty list when the session does not exist yet.

#Redis Agent Memory mapping

App conceptRedis Agent Memory operation
Current conversationget_session_memory(session_id=...)
Missing new sessionNot-found response handled as empty memory
Prompt context windowLast SESSION_CONTEXT_LIMIT events

#Code walkthrough

Inside the LangGraph node, the service reads session events and keeps the latest 12 entries:
The public helper uses the same pattern for the UI endpoint:
Key details:
  1. STM is session-scoped. The lookup uses only the current session_id.
  2. The frontend does not resend history. Redis Agent Memory stores prior events for the session.
  3. Deletes stay scoped. Deleting session memory clears only this session, not durable long-term memory.
  4. Prompt size stays bounded. SESSION_CONTEXT_LIMIT = 12 keeps the latest events instead of injecting the whole session.

#Trade-offs

  • Simple truncation is predictable, but it is less nuanced than summarization.
  • Session memory should not become the storage location for every durable fact.

#3. Search long-term memory before the LLM call

#How it works

Before the model responds, the graph searches long-term memory with the current user message. The search is filtered by owner and namespace, so the result set belongs to the right user and app context.

#Redis Agent Memory mapping

App conceptRedis Agent Memory operation
Current user requestSearch query text
User isolationownerId filter
App or environment scopenamespace filter
Relevant durable recallsearch_long_term_memory() with limit: 5

#Code walkthrough

The retrieval node finds the latest human message and searches long-term memory:
Key details:
  1. The query comes from the current turn. The demo searches with the user's latest message.
  2. Owner ID separates users. Use a stable per-user value in real apps.
  3. Namespace separates apps and environments. This prevents unrelated memories from leaking into the demo.
  4. Recall is relevance-based. The app asks for up to five relevant memories, not a full dump of all memories.
  5. Redis Agent Memory hides plumbing. The app calls memory APIs instead of direct Redis vector search code.

#Trade-offs

  • Relevant recall depends on memory quality and query phrasing.
  • Shared demo defaults can pollute results. Use stable owner IDs and separate namespaces in production.

#4. Inject memory into a LangGraph agent turn

#How it works

LangGraph turns one request into a clear sequence of nodes. The graph retrieves session memory, searches long-term memory, calls the model, writes memory, and ends.

#Data flow

The graph state keeps short-term memory and long-term memory in separate fields:
The graph edges make each step explicit:

#Code walkthrough

The model call assembles short-term and long-term context into the system prompt:
Key details:
  1. LangGraph makes the turn observable. Each node has one job.
  2. STM and LTM stay separate. The state has session_context and recalled_memories fields.
  3. The prompt tells the model how to use each scope. Short-term memory supports continuity. Long-term memory supports durable personalization.
  4. The assistant hides implementation details. The model is instructed not to mention the memory plumbing.

#Trade-offs

  • A small graph is easy to teach and debug.
  • More complex agents may add tool calls, routing, summarization, or human review nodes.

#5. Write session events and extract durable memories

#How it works

After the model responds, the graph writes the user and assistant messages to session memory. Then it asks the LLM for structured durable memory candidates and writes accepted candidates to long-term memory.

#Redis Agent Memory mapping

App conceptRedis Agent Memory operation
User turnadd_session_event(... role=MessageRole.USER)
Assistant turnadd_session_event(... role=MessageRole.ASSISTANT)
Durable memory writebulk_create_long_term_memories(memories=...)
Idempotent demo record IDDeterministic memory_id(owner_id, namespace, text)

#Code walkthrough

Structured output keeps the extraction result predictable:
The write node stores both sides of the turn as session events:
The extraction prompt tells the model what belongs in long-term memory and what should stay in the current session:
The demo normalizes candidate text, skips duplicates, and writes deterministic IDs:
The deterministic ID helper uses the user, namespace, and memory text:
Key details:
  1. Every turn updates STM. The backend writes both user and assistant events to the current session.
  2. The extractor is selective. Durable facts, stable preferences, and constraints can become long-term memory.
  3. Transient details stay in STM. Dates, destinations, and active booking requests should not become long-term memory unless the user asks the agent to remember them.
  4. Duplicates are filtered. The demo dedupes against recalled long-term memory and accepted candidates from the same turn.
  5. Deterministic IDs help idempotency. Repeating the same memory for the same owner and namespace produces the same ID.

#Trade-offs

  • LLM extraction can still vary.
  • Deduping only against retrieved memories can miss duplicates that were not recalled.
  • Real apps may need user confirmation, review queues, or deletion controls for long-term memory.

#6. Show memory behavior in the web UI

#How it works

The frontend keeps the demo visible and teachable. It shows chat messages, the current session ID, short-term memory, retrieved long-term memory, and newly written long-term memory.

#Data flow

  1. The browser sends session_id and message to /api/chat.
  2. FastAPI returns the assistant message and memory arrays.
  3. The UI renders each memory array in its panel.
  4. The + button creates a new session with the same owner ID.
  5. The × button deletes current session memory only.

#Code walkthrough

The sendMessage() function handles the chat response and memory panels:
The UI defines the three memory panels in frontend/index.html:
Nginx serves the frontend and proxies API requests to the backend container:
Key details:
  1. Memory is visible. The UI makes hidden agent memory behavior easy to inspect.
  2. New sessions demonstrate LTM reuse. The + button starts a new session but keeps the same configured owner ID.
  3. Deleting STM demonstrates scope separation. The × button clears the current session without deleting durable memory.
  4. The frontend stays minimal. Nginx serves static files and proxies /api/* to FastAPI.

#Trade-offs

  • The UI is intentionally minimal.
  • It is not an auth or multi-user production frontend.

#Running the demo

Use this flow to see both memory scopes in action:
  1. Start Docker:
  2. Open http://localhost:8080.
  3. Send: My name is Ricardo.
  4. Send: Remember that I prefer flying Delta.
  5. Send: I am planning a trip to Lisbon next month.
  6. Watch the memory panels. The name and airline preference can become long-term memory. The active trip detail should stay in short-term memory unless the user explicitly asks the agent to remember it.
  7. Click + to start a new session.
  8. Ask: What do you remember about me?
  9. Confirm that durable long-term memory persists across sessions.
  10. Click × to delete the current session memory and note that long-term memory remains.
Caveat: If the UI does not load, check /api/ready and verify your Redis Agent Memory credentials.
Caveat: If memory extraction looks different from the examples, remember that model output can vary by model version and phrasing.
Note: If long-term memory looks polluted, change DEMO_OWNER_ID or DEMO_NAMESPACE for a clean run.

#Running the tests

The demo tests mock Redis Agent Memory and OpenAI calls, so they do not need external services.
Install the test dependencies and run pytest:
The tests cover three areas:
  • tests/test_utils.py for pure helper functions.
  • tests/test_api.py for FastAPI endpoints with TestClient.
  • tests/test_service.py for service behavior and long-term memory deduplication.

#Production considerations

Before you adapt this pattern for production, account for memory as user data:
  • Use real authentication and stable user IDs instead of demo defaults.
  • Add delete, export, and review paths for long-term memory where needed.
  • Keep namespaces separate for each app and environment.
  • Monitor readiness, Redis Agent Memory latency, and downstream LLM latency.
  • Consider summarization when session memory grows beyond simple truncation.
  • Consider human-in-the-loop review before storing sensitive memories.
  • Avoid storing secrets or sensitive transient details as long-term memory.

#Next steps

  • Add a memory review screen before writing long-term memory.
  • Add user-specific auth and owner IDs.
  • Add summarization for older session events.
  • Add Redis Iris to bring in real travel data.
  • Inspect memory with Redis Insight if available.
  • Deploy with a managed Redis or Redis Agent Memory setup.

#References