Tutorial

How to build a document agent with Redis, RAG, and agent memory

March 25, 202614 minute read

William Johnston

TL;DR:

Build a document agent that stores session state, chat history, long-term memory, and vector-indexed doc chunks in Redis. This tutorial uses Bun, Express, OpenAI, and Redis to load bundled docs, retrieve relevant chunks with vector search, and answer questions over those docs with a Redis-backed retrieval-augmented generation (RAG) flow.

Note: This tutorial uses the code from the following git repository:

https://github.com/redis-developer/agent-doc-manager

#What you'll learn

How to use Redis as the primary state layer for a document agent
How to store projects and docs as JSON in Redis
How to chunk docs and index embeddings for vector search
How to keep short-term chat history and long-term memory in Redis
How to combine vector search and memory to answer questions over docs

#What you'll build

You'll build a Bun-based document agent with two main workflows:

A project workflow that loads docs into Redis and lets you edit them
A chat workflow that answers questions using Redis-backed memory and doc retrieval

The default demo path uses bundled markdown docs in the repo, so you can run the app without live crawl dependencies. If you want live crawl later, you can switch the app to Tavily mode.

#What is a document agent?

A document agent is an app that can ingest docs, keep track of context across interactions, and answer or edit content based on those docs. It combines retrieval, memory, and app state so the user can move between indexing, questioning, and editing without losing context.

#What is retrieval-augmented generation (RAG)?

RAG is a pattern where an app retrieves relevant information from a data store and passes it to a large language model (LLM) as context before generating a response. Instead of relying only on the LLM's training data, the model gets grounded in your actual docs.

In this app, the RAG flow works like this:

A user asks a question
The app searches Redis for relevant doc chunks using vector search
The app passes those chunks to the LLM as context
The LLM generates an answer grounded in your docs

Redis fits RAG well because it can store the doc chunks, serve the vector search, and hold the chat history — all in one fast data layer.

#Why use Redis for a document agent?

Redis fits this use case because the app needs one fast system that can handle multiple kinds of state:

Session storage for browser sessions
JSON storage for projects and source docs
Vector search for doc chunk retrieval
Short-term chat history for active conversations
Long-term, episodic, and semantic memory for agent behavior over time
Streams for internal logging

That gives you one fast data layer for the full document agent loop instead of stitching together separate stores for sessions, vectors, chat history, and memory.

#How does the app work?

The app has two main flows:

Project flow: create a project, load bundled docs, store them as JSON in Redis, chunk the docs, and index embeddings for vector search.
Chat flow: read short-term chat history, search semantic memory first, fall back to doc chunk retrieval via vector search, generate an answer, and store new semantic memory.

Both flows use Redis as the single data layer.

#Prerequisites

Bun runtime
Docker for the Redis container
An OpenAI API key
Basic familiarity with Redis commands and TypeScript

If you need a Redis refresher first, start with the Redis quick start and Node.js client guide.

#Step 1. Clone the repo

#Step 2. Configure environment variables

Copy the sample file:

Then add your OpenAI API key:

#Step 3. Run Redis with Docker

The app includes a Docker Compose file that uses redis:alpine.

#Step 4. Install dependencies and run the app

Open http://localhost:8080.

#Step 5. Ingest bundled docs into Redis

The default demo path uses:

In this mode, the app loads bundled markdown files from data/documents instead of calling a live crawl provider. That keeps the happy path stable for local runs, tests, and tutorials.

Create a project in the UI, add a title, and enter a short working brief. In local mode, the app loads the bundled docs into Redis as soon as the project starts.

#How does the app store docs and chunks in Redis?

The app stores each doc as a JSON value and then creates doc chunks for vector search.

At a high level:

Read markdown docs from disk
Split each doc into chunks
Generate an embedding for each chunk
Store docs and chunks in Redis JSON
Query chunks later with vector search

The code below creates two Redis Search indexes: one for full docs and one for doc chunks. The chunk index includes a vector field so you can run KNN queries against the embeddings.

Once the indexes exist, the app splits each doc into chunks, generates embeddings, and writes everything to Redis with JSON.MSET. Each doc is keyed by documents:{id} and each chunk by document-chunks:{id}.

#How does the app search doc chunks with vector search?

When a user asks a question, the app generates an embedding for the query and runs a KNN vector search against the chunk index. The search returns the most relevant doc chunks, filtered by the user's ID and ranked by cosine distance.

The app filters results by a distance threshold (0.3) so only chunks that are semantically close to the query get returned.

#How does the app manage chat memory?

The app keeps multiple memory layers in Redis:

Short-term memory stores the current chat history for a session
Long-term memory stores user-specific preferences, such as editing rules
Episodic memory stores summaries of past interactions
Semantic memory stores reusable facts that can help any future session

Short-term memory uses Redis JSON with a list of messages. Each session is stored at a key like users:u{userId}:memory:shortterm:c{sessionId} and appended to with JSON.ARRAPPEND.

For long-term and semantic memory, the app uses Redis Search with vector embeddings. Each memory entry is stored as a JSON doc with an embedding field, so the app can search for relevant memories using KNN queries — the same pattern used for doc chunks.

This lets the agent answer questions with both recent chat context and retained knowledge.

#How does the RAG flow work?

When a user sends a new question, the chat controller follows this sequence:

Push the user's message to short-term memory in Redis
Search semantic memory in Redis for a cached answer
If semantic memory hits, return the cached answer
If semantic memory misses, search doc chunks with vector search
Pass the retrieved chunks plus the chat history to the LLM
Store the new question-answer pair as semantic memory for future queries

The code below shows the core of this flow in the chat controller:

That gives you a clean semantic-memory-first path before the app falls back to doc retrieval. Repeated questions get sub-millisecond answers from Redis instead of hitting the LLM again.

#How does the app reuse editing preferences with Redis memory?

When a user edits a doc and confirms the changes, the app summarizes the pattern of that edit and stores it as long-term memory. Later, when a new project starts, the app searches for existing editing preferences:

That turns one-off edits into durable agent behavior without hard-coding the rule into the app.

#How does Redis store the data model?

Every piece of state in the app lives in Redis. Here is the full key structure:

#Optional: live crawl mode

If you want live crawl instead of the bundled demo docs, switch the app to Tavily mode:

In that mode, the app uses the URL and crawl instructions extracted from the project prompt.

#Troubleshooting

#The app starts but returns a Redis error

Check that REDIS_URL in your .env file points to a running Redis instance. If you're using Docker, verify the container is running:

#The app starts but OpenAI calls fail

Verify that OPENAI_API_KEY is set correctly in your .env file and that the key has active billing.

#No docs appear after creating a project

Make sure CRAWL_SOURCE=local is set in .env. In local mode, the app loads bundled markdown files from data/documents. If that directory is empty, no docs will appear.

#Docker Compose fails to start

Make sure Docker is running and that port 6379 is not already in use by another Redis instance.

#Next steps

Explore how agent memory works in depth with the agent memory with LangGraph and Redis tutorial
Build a RAG chatbot from scratch with the RAG GenAI chatbot with Redis tutorial
Learn the basics of vector search with the vector search getting started guide
Try Redis Cloud free to deploy Redis for your GenAI workloads

#FAQ

#What is retrieval-augmented generation (RAG)?

RAG is a pattern that combines information retrieval with LLM text generation. The app retrieves relevant doc chunks from Redis using vector search, passes them to the LLM as context, and generates an answer grounded in your actual docs rather than the model's training data alone.

#How does Redis store vector embeddings?

Redis stores vector embeddings as fields inside JSON docs. When you create a Redis Search index with a VECTOR field type, Redis indexes those embeddings and lets you run KNN (K-nearest neighbor) queries to find the most similar vectors. This app uses HNSW for doc chunks (fast approximate search) and FLAT for memory entries (exact search over a smaller dataset).

#What is agent memory?

Agent memory is the mechanism that lets an AI agent retain and recall information across interactions. This app uses four types: short-term memory (current chat history), long-term memory (user preferences), episodic memory (past interaction summaries), and semantic memory (reusable facts). All four live in Redis.

#What is a good Redis use case for this app?

This app is a good Redis use case because it needs fast retrieval across multiple kinds of state — sessions, docs, vectors, chat history, and memory — all in one data layer.

#Why not use a separate vector database?

You can, but Redis lets you keep doc storage, vector retrieval, and memory in one place. That reduces operational complexity and keeps the app architecture simpler.

#Can I use this with docs other than markdown?

The bundled demo uses markdown, but you can adapt the ingestion pipeline to handle other formats. The key requirement is that you can split the content into text chunks and generate embeddings for each chunk.

#How is this different from a chatbot?

A chatbot typically answers questions from a fixed knowledge base. This document agent goes further — it ingests docs on demand, tracks editing preferences across sessions, stores semantic memory for repeated questions, and lets you modify docs based on learned editing patterns.

#Do I need live crawl to follow this tutorial?

No. The default path uses bundled local docs so you can run the app without Tavily.

#Do I need Redis Cloud?

No. The local tutorial flow uses Docker with redis:alpine. You can switch to Redis Cloud later by updating REDIS_URL in your .env file.

How to build a document agent with Redis, RAG, and agent memory

#What you'll learn

#What you'll build

#What is a document agent?

#What is retrieval-augmented generation (RAG)?

#Why use Redis for a document agent?

#How does the app work?

#Prerequisites

#Step 1. Clone the repo

#Step 2. Configure environment variables

#Step 3. Run Redis with Docker

#Step 4. Install dependencies and run the app

#Step 5. Ingest bundled docs into Redis

#How does the app store docs and chunks in Redis?

#How does the app search doc chunks with vector search?

#How does the app manage chat memory?

#How does the RAG flow work?

#How does the app reuse editing preferences with Redis memory?

#How does Redis store the data model?

#Optional: live crawl mode

#Troubleshooting

#The app starts but returns a Redis error

#The app starts but OpenAI calls fail

#No docs appear after creating a project

#Docker Compose fails to start

#Next steps

#FAQ

#What is retrieval-augmented generation (RAG)?

#How does Redis store vector embeddings?

#What is agent memory?

#What is a good Redis use case for this app?

#Why not use a separate vector database?

#Can I use this with docs other than markdown?

#How is this different from a chatbot?

#Do I need live crawl to follow this tutorial?

#Do I need Redis Cloud?

#Additional resources