{
  "id": "redis-py",
  "title": "Redis agent memory with redis-py",
  "url": "https://redis.io/docs/latest/develop/use-cases/agent-memory/redis-py/",
  "summary": "Build a Redis-backed agent memory layer in Python with redis-py, sentence-transformers, and standard Redis commands — working memory in a Hash, long-term semantic recall as JSON with a vector index, and an event log in a Stream.",
  "tags": [
    "docs",
    "develop",
    "stack",
    "oss",
    "rs",
    "rc"
  ],
  "last_updated": "2026-06-11T16:10:09-04:00",
  "children": [],
  "page_type": "content",
  "content_hash": "30fcb5afd021a1c9beaec898430efc8eebb91f3bc49eb0c7119986ae6eda1845",
  "sections": [
    {
      "id": "overview",
      "title": "Overview",
      "role": "overview",
      "text": "This guide shows you how to build a small Redis-backed agent memory layer in Python with [`redis-py`](https://redis.io/docs/latest/develop/clients/redis-py) and the [`sentence-transformers`](https://www.sbert.net/) library, using only standard Redis commands — no agent-memory SDK, no managed service. It includes a local web server built with the Python standard library so you can send turns at the agent, watch working memory update in place, see semantically similar long-term memories recalled in real time, watch the write-time deduplication skip near-duplicates, and inspect the per-thread event log."
    },
    {
      "id": "overview",
      "title": "Overview",
      "role": "overview",
      "text": "The memory layer splits across three Redis primitives, each handling one tier:\n\n* **Working memory** for the active session is a [Hash](https://redis.io/docs/latest/develop/data-types/hashes) at `agent:session:<thread_id>` holding the goal, scratchpad, a rolling window of recent turns (as a JSON list inside one field), and a few audit timestamps. One [`HGETALL`](https://redis.io/docs/latest/commands/hgetall) returns the whole session in a single round trip; every write refreshes the key's [`EXPIRE`](https://redis.io/docs/latest/commands/expire) so idle sessions decay on their own.\n* **Long-term memory** is a set of [JSON](https://redis.io/docs/latest/develop/data-types/json) documents at `agent:mem:<id>`, each carrying the memory text, a 384-dimensional embedding vector, and tag fields for user, namespace, kind (episodic / semantic), and source thread. A single [Redis Search](https://redis.io/docs/latest/develop/ai/search-and-query) index covers the [HNSW vector field](https://redis.io/docs/latest/develop/ai/search-and-query/vectors) and every metadata field, so one [`FT.SEARCH`](https://redis.io/docs/latest/commands/ft.search) call performs the KNN with the metadata pre-filter in the same round trip. Write-time deduplication runs the same KNN at insert time and skips a new memory whose nearest existing entry is within a tighter threshold.\n* **Event log** for the agent's actions and observations is a [Stream](https://redis.io/docs/latest/develop/data-types/streams) at `agent:events:<thread_id>`, appended with [`XADD MAXLEN ~`](https://redis.io/docs/latest/commands/xadd) so retention stays bounded automatically, replayed with [`XREVRANGE`](https://redis.io/docs/latest/commands/xrevrange).\n\nThat gives you:\n\n* A single round trip per tier: one [`HGETALL`](https://redis.io/docs/latest/commands/hgetall) for the session, one [`FT.SEARCH`](https://redis.io/docs/latest/commands/ft.search) for recall, one [`XADD`](https://redis.io/docs/latest/commands/xadd) for the event log.\n* Sub-millisecond reads on every step of the agent loop, so the memory layer doesn't dominate the per-step latency.\n* Per-tier decay: short TTLs on working memory, longer on episodic memories, no TTL on semantic memories. Combined with a database-level [eviction policy](https://redis.io/docs/latest/develop/reference/eviction) (LFU is the common choice), memory stays bounded under pressure.\n* Scoping enforced inside the query: a recall query for `user=alice` will never see `user=bob`'s memories, because the TAG filter goes into the same [`FT.SEARCH`](https://redis.io/docs/latest/commands/ft.search) call as the KNN."
    },
    {
      "id": "how-it-works",
      "title": "How it works",
      "role": "content",
      "text": "Each turn through the agent loop touches all three tiers in one pass: append to working memory, recall similar long-term memories, write the turn back as a new memory (with deduplication), and append one event to the log."
    },
    {
      "id": "per-turn-flow",
      "title": "Per-turn flow",
      "role": "content",
      "text": "1. The application calls `embedder.encode_one(text)` to turn the incoming turn into a 384-dimensional `float32` vector.\n2. `session.append_turn(thread_id, role, content)` reads the per-thread Hash with [`HGETALL`](https://redis.io/docs/latest/commands/hgetall), appends the new turn to the rolling window in application code, trims it back to the configured maximum, and writes the Hash back with an [`HSET`](https://redis.io/docs/latest/commands/hset) + [`EXPIRE`](https://redis.io/docs/latest/commands/expire) pipeline. The session TTL refreshes on every write so an active thread stays alive.\n3. `memory.recall(vec, user=..., namespace=..., k=5)` runs [`FT.SEARCH`](https://redis.io/docs/latest/commands/ft.search) with a TAG pre-filter and a `KNN 5` clause. Redis returns the closest matching memories together with their cosine distances; memories beyond the recall threshold are dropped before they reach the agent so an unrelated query doesn't surface confident-looking false positives.\n4. `memory.remember(text, vec, user=..., namespace=..., kind=...)` runs the same KNN with a tighter dedup threshold. If an existing memory is within the threshold, the new write is skipped and the existing memory's `hit_count` is incremented with [`JSON.NUMINCRBY`](https://redis.io/docs/latest/commands/json.numincrby); otherwise a fresh JSON document is written with [`JSON.SET`](https://redis.io/docs/latest/commands/json.set) and a per-kind [`EXPIRE`](https://redis.io/docs/latest/commands/expire) — `episodic` defaults to seven days, `semantic` has no TTL by default.\n5. `event_log.record(thread_id, action, detail)` appends one entry to the per-thread Stream with [`XADD MAXLEN ~`](https://redis.io/docs/latest/commands/xadd), bounding retention to roughly a thousand entries per thread without an explicit cleanup job.\n\nThe embedding is computed once and reused for steps 3 and 4 — there's no point encoding the same text twice. Recall runs before the write, so the agent doesn't see its own just-written turn echoed back as a recalled memory."
    },
    {
      "id": "the-session-store",
      "title": "The session store",
      "role": "content",
      "text": "`AgentSession` wraps the working-memory Hash and the rolling turn window ([source](https://github.com/redis/docs/blob/main/content/develop/use-cases/agent-memory/redis-py/session_store.py)):\n\n[code example]\n\nThe data model is one Hash per thread. The rolling turn window is stored as a JSON string in a single field so the whole session loads in one [`HGETALL`](https://redis.io/docs/latest/commands/hgetall) — the hash never grows in size or field count as the conversation goes on.\n\n[code example]\n\nEvery write — `start`, `append_turn`, `set_scratchpad` — runs the [`HSET`](https://redis.io/docs/latest/commands/hset) and [`EXPIRE`](https://redis.io/docs/latest/commands/expire) inside a `MULTI` / `EXEC` block, so a connection drop between the two writes can't leave the session without a TTL."
    },
    {
      "id": "the-long-term-memory-store",
      "title": "The long-term memory store",
      "role": "content",
      "text": "`LongTermMemory` owns the JSON documents, the vector index, the recall query, and the write-time deduplication ([source](https://github.com/redis/docs/blob/main/content/develop/use-cases/agent-memory/redis-py/long_term_memory.py)):\n\n[code example]"
    },
    {
      "id": "data-model",
      "title": "Data model",
      "role": "content",
      "text": "Each memory is a JSON document at `agent:mem:<id>`. The embedding is a JSON array of floats so the document is human-readable from `redis-cli`; [`FT.SEARCH`](https://redis.io/docs/latest/commands/ft.search) still expects the *query* vector as raw `float32` bytes, regardless of how the indexed document stores it.\n\n[code example]\n\nThe Redis Search index is declared on the JSON document type with `as_name` aliases so the query syntax stays compact:\n\n[code example]"
    },
    {
      "id": "the-query",
      "title": "The query",
      "role": "content",
      "text": "Both recall and dedup are the same hybrid query: a TAG pre-filter in parentheses followed by `=>[KNN k @embedding $vec]`. With `DIALECT 2`, Redis applies the filter first and KNN-ranks only the matching documents.\n\n[code example]\n\n`distance` is the cosine *distance* (0 means identical, 2 means opposite). Recall and dedup share the same query shape; only the threshold differs — strict at write time so the index doesn't fill with paraphrases of the same fact, looser at read time so the agent gets a wider net of relevant memories."
    },
    {
      "id": "per-kind-ttls",
      "title": "Per-kind TTLs",
      "role": "content",
      "text": "`remember` resolves the entry's TTL from the memory's `kind`:\n\n| Kind      | Default TTL | When to use it                                              |\n|-----------|-------------|-------------------------------------------------------------|\n| `episodic` | 7 days     | Snapshots from a specific session that should decay.        |\n| `semantic` | none       | Distilled facts and preferences the agent carries forward.  |\n\nYou can override per write with `ttl_seconds=...` on `remember`, or pass a different `ttl_by_kind={...}` map to the `LongTermMemory` constructor — for example, to give semantic memories a six-month TTL while leaving episodic memories at seven days."
    },
    {
      "id": "the-event-log",
      "title": "The event log",
      "role": "content",
      "text": "`AgentEventLog` is a thin wrapper over a per-thread Redis Stream ([source](https://github.com/redis/docs/blob/main/content/develop/use-cases/agent-memory/redis-py/event_log.py)):\n\n[code example]\n\n`record` calls [`XADD`](https://redis.io/docs/latest/commands/xadd) with `maxlen=~1000`. The tilde lets Redis trim in whole-node units instead of exactly-N units, which is much cheaper at the cost of overshooting the bound by up to a node's worth — the right tradeoff for an audit log where exact length doesn't matter.\n\nThe Stream is independent of the session Hash and the long-term JSON documents: it answers \"what just happened\" without competing with either of those for indexing or memory budget. Consumer groups (not used in this demo) would let downstream workers — summarizers, consolidators, audit pipelines — replay the log without losing position."
    },
    {
      "id": "concurrency-caveats",
      "title": "Concurrency caveats",
      "role": "content",
      "text": "The three helpers above trade correctness under heavy concurrency for clarity. Each is fine on a single-process demo, but lifting the code into a real multi-worker agent surfaces three races worth knowing about:\n\n* **Working memory is read-modify-write.** `AgentSession.append_turn` calls [`HGETALL`](https://redis.io/docs/latest/commands/hgetall), mutates the `recent_turns` list in application code, and writes the Hash back with [`HSET`](https://redis.io/docs/latest/commands/hset). Two concurrent turns on the same thread can both read the same `recent_turns`, append different entries, and write back — last writer wins, the other turn is silently lost. The robust fix is either a [`WATCH`](https://redis.io/docs/latest/commands/watch) / [`MULTI`](https://redis.io/docs/latest/commands/multi) / [`EXEC`](https://redis.io/docs/latest/commands/exec) loop around the read-modify-write or a small [Lua script](https://redis.io/docs/latest/commands/eval) that does the append atomically server-side.\n\n* **Long-term dedup is not atomic.** `LongTermMemory.remember` runs a [`FT.SEARCH`](https://redis.io/docs/latest/commands/ft.search) KNN lookup, decides whether the candidate is a duplicate, and (if not) calls [`JSON.SET`](https://redis.io/docs/latest/commands/json.set). Two workers seeing the same fact in flight can each fail to see the other's not-yet-committed write and both insert a new memory. The pragmatic fix is to accept that the index will occasionally hold near-duplicates and run a background consolidator that periodically scans for memory pairs within a tight distance and merges them, rather than trying to make the write itself atomic.\n\n* **The active thread is server state.** The demo server keeps a single `current_thread_id` that `/new_thread` and `/reset` mutate under a lock; `handle_turn` reads it outside that lock, so a turn racing with a thread rotation can apply to the previous thread. This is cosmetic for a one-user browser demo. A multi-user agent would carry the thread id on the request itself rather than as shared server state.\n\nThose caveats are deliberate. A more conservative implementation would obscure the Redis-shaped parts of the pattern; the demo prioritizes a small, readable code path that maps directly onto the commands in the prose above."
    },
    {
      "id": "pre-seeding-long-term-memory",
      "title": "Pre-seeding long-term memory",
      "role": "content",
      "text": "In a real deployment the memory store fills up organically as the agent reasons over user turns: each turn produces zero or more memories that flow into the store, with deduplication catching repeats. For the demo, `seed_memory.py` pre-loads a small set of mixed semantic and episodic memories so the very first recall query returns something useful ([source](https://github.com/redis/docs/blob/main/content/develop/use-cases/agent-memory/redis-py/seed_memory.py)):\n\n[code example]\n\nThe seed list mixes long-lived facts and preferences (`semantic`) with snapshots of past sessions (`episodic`), so the **Kind to write** control in the demo has something to switch between when a new turn is being remembered."
    },
    {
      "id": "the-interactive-demo",
      "title": "The interactive demo",
      "role": "content",
      "text": "`demo_server.py` runs a [`ThreadingHTTPServer`](https://docs.python.org/3/library/http.server.html) on port 8086. The HTML page exposes three live panels — working memory, recalled memories, event log — plus a memories table for admin actions. Endpoints:\n\n| Endpoint            | What it does                                                                    |\n|---------------------|---------------------------------------------------------------------------------|\n| `GET  /state`       | Index info, current session, in-scope long-term memories, and recent events.    |\n| `POST /turn`        | Embed the text, append to working memory, recall similar memories, optionally write a new memory (with dedup), append an event. |\n| `POST /new_thread`  | Start a fresh thread; long-term memory and other threads are untouched.         |\n| `POST /reset`       | Drop every long-term memory and re-seed the sample set.                         |\n| `POST /drop_memory` | Delete a single long-term memory by id.                                         |\n\nThe server holds one `LocalEmbedder`, one `AgentSession`, one `LongTermMemory`, and one `AgentEventLog` for the lifetime of the process. The \"current thread\" is a class attribute that the **New thread** button rotates — every browser session inherits the same thread until you explicitly start a new one."
    },
    {
      "id": "run-the-demo-locally",
      "title": "Run the demo locally",
      "role": "content",
      "text": "1.  Clone the [`redis/docs`](https://github.com/redis/docs) repository and change into the example\n    directory:\n\n    [code example]\n\n2.  Install the dependencies:\n\n    [code example]\n\n3.  Make sure a Redis instance with Redis Search and Redis JSON is running locally on\n    port 6379. [Redis Stack](https://redis.io/docs/latest/operate/oss_and_stack/install/install-stack) ships both,\n    or [Redis 8](https://redis.io/docs/latest/develop/ai/search-and-query) with the Search and JSON modules\n    enabled.\n\n4.  Start the demo server. The first run downloads the `all-MiniLM-L6-v2` model\n    (~80 MB) into the local Hugging Face cache:\n\n    [code example]\n\n5.  Open <http://localhost:8086> and try some turns:\n\n    *  **\"Remind me which theme I prefer in editors.\"** — paraphrase of a seeded\n       semantic memory (\"The user dislikes dark mode and prefers a high-contrast\n       light theme...\"). You should see that memory recalled with a cosine\n       distance around 0.47, comfortably under the 0.55 default recall\n       threshold.\n    *  **\"What did we discuss about the order-routing outage?\"** — paraphrase of\n       a seeded episodic memory; the postmortem memory should recall around\n       0.44. Switch the **Kind to write** dropdown to `skip` so the question\n       itself doesn't enter long-term memory.\n    *  **\"I prefer concise answers without filler phrases.\"** — paraphrase of\n       a seeded *semantic* memory. Switch the **Kind to write** dropdown to\n       `semantic` so the dedup KNN runs in the same kind as the seed (dedup\n       is scoped per kind, on purpose, so an episodic write can't collapse\n       onto a semantic memory). You should then see the write **deduped**\n       onto the existing memory at a cosine distance around 0.15, with\n       `hit_count` ticking up in the memories table.\n    *  **\"My favorite color is teal.\"** — unrelated to any seed; nothing\n       recalls above the threshold (every seed lands above 0.8), and the new\n       memory is written as `episodic` (or `semantic`, depending on the\n       dropdown) under a fresh id.\n    *  Switch the **User** field to `bob` and re-ask any of the above — recall\n       returns nothing because the seed memories live under `default`. That's\n       the TAG pre-filter at work inside [`FT.SEARCH`](https://redis.io/docs/latest/commands/ft.search).\n    *  Slide the **Recall threshold** down to 0.30 to see borderline paraphrases\n       drop out of the recall set, then back up to 0.70 to watch them return.\n\n    `all-MiniLM-L6-v2` puts a faithful paraphrase in the 0.15 – 0.50\n    cosine-distance range, a loose paraphrase or related topic in the\n    0.50 – 0.80 range, and unrelated queries above 0.8 — which is what\n    motivates the 0.55 default recall threshold and the 0.20 default\n    dedup threshold. A stricter embedding model (or a domain-tuned one)\n    would let you tighten both; a noisier one would push them up. The\n    right thresholds are always a function of the model, the corpus,\n    and how conservative the agent needs to be about accepting a memory\n    as a match.\n\nThe server is read/write against your local Redis. The default memory index is `agentmem:idx`, JSON keys live under `agent:mem:`, session Hashes under `agent:session:`, and event Streams under `agent:events:`. Useful flags:\n\n* `--no-reset` — keep the existing long-term memories across restarts instead of dropping and re-seeding.\n* `--session-ttl-seconds` — change the working-memory TTL (default 3600).\n* `--dedup-threshold` — change the cosine-distance cutoff for write-time deduplication.\n* `--recall-threshold` — change the default cosine-distance cutoff for recall."
    }
  ],
  "examples": [
    {
      "id": "the-session-store-ex0",
      "language": "python",
      "code": "import redis\nfrom session_store import AgentSession\n\nr = redis.Redis(host=\"localhost\", port=6379, decode_responses=False)\nsession = AgentSession(\n    redis_client=r,\n    key_prefix=\"agent:session:\",\n    default_ttl_seconds=3600,  # one hour\n    max_turns=20,              # rolling window per thread\n)\n\nthread_id = session.new_thread_id()\nsession.start(thread_id, user=\"alice\", agent=\"demo-agent\",\n              goal=\"Plan next week's meetings.\")\nsession.append_turn(thread_id, role=\"user\",\n                    content=\"Schedule a budget review with finance.\")\nstate = session.load(thread_id)\nprint(state.turn_count, len(state.recent_turns), state.ttl_seconds)",
      "section_id": "the-session-store"
    },
    {
      "id": "the-session-store-ex1",
      "language": "text",
      "code": "agent:session:9f3d2a4b8c61\n  thread_id=9f3d2a4b8c61\n  user=alice\n  agent=demo-agent\n  goal=Plan next week's meetings.\n  scratchpad=Need to confirm finance's availability.\n  turn_count=4\n  created_ts=1715990400.12\n  last_active_ts=1715990650.83\n  recent_turns=[{\"role\":\"user\",\"content\":\"...\",\"ts\":...}, ...]",
      "section_id": "the-session-store"
    },
    {
      "id": "the-long-term-memory-store-ex0",
      "language": "python",
      "code": "import numpy as np\nfrom long_term_memory import LongTermMemory\nfrom embeddings import LocalEmbedder\n\nmemory = LongTermMemory(\n    redis_client=r,\n    index_name=\"agentmem:idx\",\n    key_prefix=\"agent:mem:\",\n    dedup_threshold=0.20,   # cosine distance — tight at write time\n    recall_threshold=0.55,  # looser at read time\n)\nembedder = LocalEmbedder()\nmemory.create_index()  # idempotent\n\n# Write a memory. The same KNN that powers recall also runs here\n# at a tighter threshold so paraphrases of the same fact collapse.\nvec = embedder.encode_one(\"The user prefers light mode in editors.\")\nresult = memory.remember(\n    text=\"The user prefers light mode in editors.\",\n    embedding=np.asarray(vec, dtype=np.float32),\n    user=\"alice\",\n    namespace=\"default\",\n    kind=\"semantic\",\n    source_thread=\"9f3d2a4b8c61\",\n)\nprint(result.deduped, result.id, result.existing_distance)\n\n# Recall against a later question.\nq = embedder.encode_one(\"Which theme does this user like?\")\nhits = memory.recall(\n    query_embedding=np.asarray(q, dtype=np.float32),\n    user=\"alice\",\n    namespace=\"default\",\n    k=5,\n)\nfor h in hits:\n    print(f\"{h.distance:.3f} [{h.kind}] {h.text}\")",
      "section_id": "the-long-term-memory-store"
    },
    {
      "id": "data-model-ex0",
      "language": "json",
      "code": "agent:mem:7c3f8a1b9e02\n{\n  \"id\": \"7c3f8a1b9e02\",\n  \"user\": \"alice\",\n  \"namespace\": \"default\",\n  \"kind\": \"semantic\",\n  \"source_thread\": \"9f3d2a4b8c61\",\n  \"text\": \"The user prefers light mode in editors.\",\n  \"embedding\": [0.013, -0.041, ...],\n  \"created_ts\": 1715990400.12,\n  \"hit_count\": 0\n}",
      "section_id": "data-model"
    },
    {
      "id": "data-model-ex1",
      "language": "text",
      "code": "FT.CREATE agentmem:idx\n  ON JSON PREFIX 1 agent:mem:\n  SCHEMA\n    $.text          AS text          TEXT\n    $.user          AS user          TAG\n    $.namespace     AS namespace     TAG\n    $.kind          AS kind          TAG\n    $.source_thread AS source_thread TAG\n    $.created_ts    AS created_ts    NUMERIC SORTABLE\n    $.hit_count     AS hit_count     NUMERIC SORTABLE\n    $.embedding     AS embedding     VECTOR HNSW 6\n                                       TYPE FLOAT32 DIM 384\n                                       DISTANCE_METRIC COSINE",
      "section_id": "data-model"
    },
    {
      "id": "the-query-ex0",
      "language": "text",
      "code": "FT.SEARCH agentmem:idx\n  \"(@user:{alice} @namespace:{default} @kind:{semantic})\n     =>[KNN 5 @embedding $vec AS distance]\"\n  PARAMS 2 vec <384-float32-bytes>\n  SORTBY distance\n  RETURN 8 user namespace kind source_thread text created_ts hit_count distance\n  DIALECT 2",
      "section_id": "the-query"
    },
    {
      "id": "the-event-log-ex0",
      "language": "python",
      "code": "from event_log import AgentEventLog\n\nevents = AgentEventLog(redis_client=r, max_len=1000)\nevents.record(thread_id, action=\"turn_appended:user\",\n              detail=\"Schedule a budget review with finance.\")\nevents.record(thread_id, action=\"memory_written\",\n              detail=\"wrote 7c3f8a1b9e02 as semantic\")\n\nfor event in events.recent(thread_id, count=20):\n    print(event.action, event.detail)",
      "section_id": "the-event-log"
    },
    {
      "id": "pre-seeding-long-term-memory-ex0",
      "language": "python",
      "code": "from seed_memory import seed\nfrom long_term_memory import LongTermMemory\nfrom embeddings import LocalEmbedder\n\nmemory = LongTermMemory()\nembedder = LocalEmbedder()\nmemory.create_index()\nseed(memory, embedder, user=\"default\", namespace=\"default\")",
      "section_id": "pre-seeding-long-term-memory"
    },
    {
      "id": "run-the-demo-locally-ex0",
      "language": "bash",
      "code": "git clone https://github.com/redis/docs.git\n    cd docs/content/develop/use-cases/agent-memory/redis-py",
      "section_id": "run-the-demo-locally"
    },
    {
      "id": "run-the-demo-locally-ex1",
      "language": "bash",
      "code": "pip install redis sentence-transformers numpy",
      "section_id": "run-the-demo-locally"
    },
    {
      "id": "run-the-demo-locally-ex2",
      "language": "bash",
      "code": "python demo_server.py",
      "section_id": "run-the-demo-locally"
    }
  ]
}
