{
  "id": "embeddings_cache",
  "title": "Caching Embeddings",
  "url": "https://redis.io/docs/latest/develop/ai/redisvl/0.9.1/user_guide/embeddings_cache/",
  "summary": "",
  "content": "\n\nRedisVL provides an `EmbeddingsCache` that makes it easy to store and retrieve embedding vectors with their associated text and metadata. This cache is particularly useful for applications that frequently compute the same embeddings, enabling you to:\n\n- Reduce computational costs by reusing previously computed embeddings\n- Decrease latency in applications that rely on embeddings\n- Store additional metadata alongside embeddings for richer applications\n\nThis notebook will show you how to use the `EmbeddingsCache` effectively in your applications.\n\n## Setup\n\nFirst, let's import the necessary libraries. We'll use a text embedding model from HuggingFace to generate our embeddings.\n\n\n```python\nimport os\nimport time\nimport numpy as np\n\n# Disable tokenizers parallelism to avoid deadlocks\nos.environ[\"TOKENIZERS_PARALLELISM\"] = \"False\"\n\n# Import the EmbeddingsCache\nfrom redisvl.extensions.cache.embeddings import EmbeddingsCache\nfrom redisvl.utils.vectorize import HFTextVectorizer\n```\n\nLet's create a vectorizer to generate embeddings for our texts:\n\n\n```python\n# Initialize the vectorizer\nvectorizer = HFTextVectorizer(\n    model=\"redis/langcache-embed-v1\",\n    cache_folder=os.getenv(\"SENTENCE_TRANSFORMERS_HOME\")\n)\n```\n\n    /Users/tyler.hutcherson/Documents/AppliedAI/redis-vl-python/.venv/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n      from .autonotebook import tqdm as notebook_tqdm\n\n\n    13:06:09 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: mps\n    13:06:09 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: redis/langcache-embed-v1\n    13:06:09 sentence_transformers.SentenceTransformer WARNING   You try to use a model that was created with version 4.1.0, however, your version is 3.4.1. This might cause unexpected behavior or errors. In that case, try to update to the latest version.\n    \n    \n    \n\n\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00,  4.09it/s]\n\n\n## Initializing the EmbeddingsCache\n\nNow let's initialize our `EmbeddingsCache`. The cache requires a Redis connection to store the embeddings and their associated data.\n\n\n```python\n# Initialize the embeddings cache\ncache = EmbeddingsCache(\n    name=\"embedcache\",                  # name prefix for Redis keys\n    redis_url=\"redis://localhost:6379\",  # Redis connection URL\n    ttl=None                            # Optional TTL in seconds (None means no expiration)\n)\n```\n\n## Basic Usage\n\n### Storing Embeddings\n\nLet's store some text with its embedding in the cache. The `set` method takes the following parameters:\n- `text`: The input text that was embedded\n- `model_name`: The name of the embedding model used\n- `embedding`: The embedding vector\n- `metadata`: Optional metadata associated with the embedding\n- `ttl`: Optional time-to-live override for this specific entry\n\n\n```python\n# Text to embed\ntext = \"What is machine learning?\"\nmodel_name = \"redis/langcache-embed-v1\"\n\n# Generate the embedding\nembedding = vectorizer.embed(text)\n\n# Optional metadata\nmetadata = {\"category\": \"ai\", \"source\": \"user_query\"}\n\n# Store in cache\nkey = cache.set(\n    text=text,\n    model_name=model_name,\n    embedding=embedding,\n    metadata=metadata\n)\n\nprint(f\"Stored with key: {key[:15]}...\")\n```\n\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00,  3.18it/s]\n\n    Stored with key: embedcache:909f...\n\n\n    \n\n\n### Retrieving Embeddings\n\nTo retrieve an embedding from the cache, use the `get` method with the original text and model name:\n\n\n```python\n# Retrieve from cache\n\nif result := cache.get(text=text, model_name=model_name):\n    print(f\"Found in cache: {result['text']}\")\n    print(f\"Model: {result['model_name']}\")\n    print(f\"Metadata: {result['metadata']}\")\n    print(f\"Embedding shape: {np.array(result['embedding']).shape}\")\nelse:\n    print(\"Not found in cache.\")\n```\n\n    Found in cache: What is machine learning?\n    Model: redis/langcache-embed-v1\n    Metadata: {'category': 'ai', 'source': 'user_query'}\n    Embedding shape: (768,)\n\n\n### Checking Existence\n\nYou can check if an embedding exists in the cache without retrieving it using the `exists` method:\n\n\n```python\n# Check if existing text is in cache\nexists = cache.exists(text=text, model_name=model_name)\nprint(f\"First query exists in cache: {exists}\")\n\n# Check if a new text is in cache\nnew_text = \"What is deep learning?\"\nexists = cache.exists(text=new_text, model_name=model_name)\nprint(f\"New query exists in cache: {exists}\")\n```\n\n    First query exists in cache: True\n    New query exists in cache: False\n\n\n### Removing Entries\n\nTo remove an entry from the cache, use the `drop` method:\n\n\n```python\n# Remove from cache\ncache.drop(text=text, model_name=model_name)\n\n# Verify it's gone\nexists = cache.exists(text=text, model_name=model_name)\nprint(f\"After dropping: {exists}\")\n```\n\n    After dropping: False\n\n\n## Advanced Usage\n\n### Key-Based Operations\n\nThe `EmbeddingsCache` also provides methods that work directly with Redis keys, which can be useful for advanced use cases:\n\n\n```python\n# Store an entry again\nkey = cache.set(\n    text=text,\n    model_name=model_name,\n    embedding=embedding,\n    metadata=metadata\n)\nprint(f\"Stored with key: {key[:15]}...\")\n\n# Check existence by key\nexists_by_key = cache.exists_by_key(key)\nprint(f\"Exists by key: {exists_by_key}\")\n\n# Retrieve by key\nresult_by_key = cache.get_by_key(key)\nprint(f\"Retrieved by key: {result_by_key['text']}\")\n\n# Drop by key\ncache.drop_by_key(key)\n```\n\n    Stored with key: embedcache:909f...\n    Exists by key: True\n    Retrieved by key: What is machine learning?\n\n\n### Batch Operations\n\nWhen working with multiple embeddings, batch operations can significantly improve performance by reducing network roundtrips. The `EmbeddingsCache` provides methods prefixed with `m` (for \"multi\") that handle batches efficiently.\n\n\n```python\n# Create multiple embeddings\ntexts = [\n    \"What is machine learning?\",\n    \"How do neural networks work?\",\n    \"What is deep learning?\"\n]\nembeddings = [vectorizer.embed(t) for t in texts]\n\n# Prepare batch items as dictionaries\nbatch_items = [\n    {\n        \"text\": texts[0],\n        \"model_name\": model_name,\n        \"embedding\": embeddings[0],\n        \"metadata\": {\"category\": \"ai\", \"type\": \"question\"}\n    },\n    {\n        \"text\": texts[1],\n        \"model_name\": model_name,\n        \"embedding\": embeddings[1],\n        \"metadata\": {\"category\": \"ai\", \"type\": \"question\"}\n    },\n    {\n        \"text\": texts[2],\n        \"model_name\": model_name,\n        \"embedding\": embeddings[2],\n        \"metadata\": {\"category\": \"ai\", \"type\": \"question\"}\n    }\n]\n\n# Store multiple embeddings in one operation\nkeys = cache.mset(batch_items)\nprint(f\"Stored {len(keys)} embeddings with batch operation\")\n\n# Check if multiple embeddings exist in one operation\nexist_results = cache.mexists(texts, model_name)\nprint(f\"All embeddings exist: {all(exist_results)}\")\n\n# Retrieve multiple embeddings in one operation\nresults = cache.mget(texts, model_name)\nprint(f\"Retrieved {len(results)} embeddings in one operation\")\n\n# Delete multiple embeddings in one operation\ncache.mdrop(texts, model_name)\n\n# Alternative: key-based batch operations\n# cache.mget_by_keys(keys)     # Retrieve by keys\n# cache.mexists_by_keys(keys)  # Check existence by keys\n# cache.mdrop_by_keys(keys)    # Delete by keys\n```\n\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00, 21.37it/s]\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00,  9.04it/s]\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00, 20.84it/s]\n\n    Stored 3 embeddings with batch operation\n    All embeddings exist: True\n    Retrieved 3 embeddings in one operation\n\n\n    \n\n\nBatch operations are particularly beneficial when working with large numbers of embeddings. They provide the same functionality as individual operations but with better performance by reducing network roundtrips.\n\nFor asynchronous applications, async versions of all batch methods are also available with the `am` prefix (e.g., `amset`, `amget`, `amexists`, `amdrop`).\n\n### Working with TTL (Time-To-Live)\n\nYou can set a global TTL when initializing the cache, or specify TTL for individual entries:\n\n\n```python\n# Create a cache with a default 5-second TTL\nttl_cache = EmbeddingsCache(\n    name=\"ttl_cache\",\n    redis_url=\"redis://localhost:6379\",\n    ttl=5  # 5 second TTL\n)\n\n# Store an entry\nkey = ttl_cache.set(\n    text=text,\n    model_name=model_name,\n    embedding=embedding\n)\n\n# Check if it exists\nexists = ttl_cache.exists_by_key(key)\nprint(f\"Immediately after setting: {exists}\")\n\n# Wait for it to expire\ntime.sleep(6)\n\n# Check again\nexists = ttl_cache.exists_by_key(key)\nprint(f\"After waiting: {exists}\")\n```\n\n    Immediately after setting: True\n    After waiting: False\n\n\nYou can also override the default TTL for individual entries:\n\n\n```python\n# Store an entry with a custom 1-second TTL\nkey1 = ttl_cache.set(\n    text=\"Short-lived entry\",\n    model_name=model_name,\n    embedding=embedding,\n    ttl=1  # Override with 1 second TTL\n)\n\n# Store another entry with the default TTL (5 seconds)\nkey2 = ttl_cache.set(\n    text=\"Default TTL entry\",\n    model_name=model_name,\n    embedding=embedding\n    # No TTL specified = uses the default 5 seconds\n)\n\n# Wait for 2 seconds\ntime.sleep(2)\n\n# Check both entries\nexists1 = ttl_cache.exists_by_key(key1)\nexists2 = ttl_cache.exists_by_key(key2)\n\nprint(f\"Entry with custom TTL after 2 seconds: {exists1}\")\nprint(f\"Entry with default TTL after 2 seconds: {exists2}\")\n\n# Cleanup\nttl_cache.drop_by_key(key2)\n```\n\n    Entry with custom TTL after 2 seconds: False\n    Entry with default TTL after 2 seconds: True\n\n\n## Async Support\n\nThe `EmbeddingsCache` provides async versions of all methods for use in async applications. The async methods are prefixed with `a` (e.g., `aset`, `aget`, `aexists`, `adrop`).\n\n\n```python\nasync def async_cache_demo():\n    # Store an entry asynchronously\n    key = await cache.aset(\n        text=\"Async embedding\",\n        model_name=model_name,\n        embedding=embedding,\n        metadata={\"async\": True}\n    )\n    \n    # Check if it exists\n    exists = await cache.aexists_by_key(key)\n    print(f\"Async set successful? {exists}\")\n    \n    # Retrieve it\n    result = await cache.aget_by_key(key)\n    success = result is not None and result[\"text\"] == \"Async embedding\"\n    print(f\"Async get successful? {success}\")\n    \n    # Remove it\n    await cache.adrop_by_key(key)\n\n# Run the async demo\nawait async_cache_demo()\n```\n\n    Async set successful? True\n    Async get successful? True\n\n\n## Real-World Example\n\nLet's build a simple embeddings caching system for a text classification task. We'll check the cache before computing new embeddings to save computation time.\n\n\n```python\n# Create a fresh cache for this example\nexample_cache = EmbeddingsCache(\n    name=\"example_cache\",\n    redis_url=\"redis://localhost:6379\",\n    ttl=3600  # 1 hour TTL\n)\n\nvectorizer = HFTextVectorizer(\n    model=model_name,\n    cache=example_cache,\n    cache_folder=os.getenv(\"SENTENCE_TRANSFORMERS_HOME\")\n)\n\n# Simulate processing a stream of queries\nqueries = [\n    \"What is artificial intelligence?\",\n    \"How does machine learning work?\",\n    \"What is artificial intelligence?\",  # Repeated query\n    \"What are neural networks?\",\n    \"How does machine learning work?\"   # Repeated query\n]\n\n# Process the queries and track statistics\ntotal_queries = 0\ncache_hits = 0\n\nfor query in queries:\n    total_queries += 1\n    \n    # Check cache before computing\n    before = example_cache.exists(text=query, model_name=model_name)\n    if before:\n        cache_hits += 1\n    \n    # Get embedding (will compute or use cache)\n    embedding = vectorizer.embed(query)\n\n# Report statistics\ncache_misses = total_queries - cache_hits\nhit_rate = (cache_hits / total_queries) * 100\n\nprint(\"\\nStatistics:\")\nprint(f\"Total queries: {total_queries}\")\nprint(f\"Cache hits: {cache_hits}\")\nprint(f\"Cache misses: {cache_misses}\")\nprint(f\"Cache hit rate: {hit_rate:.1f}%\")\n\n# Cleanup\nfor query in set(queries):  # Use set to get unique queries\n    example_cache.drop(text=query, model_name=model_name)\n```\n\n    13:06:20 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: mps\n    13:06:20 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: redis/langcache-embed-v1\n    13:06:20 sentence_transformers.SentenceTransformer WARNING   You try to use a model that was created with version 4.1.0, however, your version is 3.4.1. This might cause unexpected behavior or errors. In that case, try to update to the latest version.\n    \n    \n    \n\n\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00, 21.84it/s]\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00, 22.04it/s]\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00, 22.62it/s]\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00, 22.71it/s]\n\n    \n    Statistics:\n    Total queries: 5\n    Cache hits: 2\n    Cache misses: 3\n    Cache hit rate: 40.0%\n\n\n    \n\n\n## Performance Benchmark\n\nLet's run benchmarks to compare the performance of embedding with and without caching, as well as batch versus individual operations.\n\n\n```python\n# Text to use for benchmarking\nbenchmark_text = \"This is a benchmark text to measure the performance of embedding caching.\"\n\n# Create a fresh cache for benchmarking\nbenchmark_cache = EmbeddingsCache(\n    name=\"benchmark_cache\",\n    redis_url=\"redis://localhost:6379\",\n    ttl=3600  # 1 hour TTL\n)\nvectorizer.cache = benchmark_cache\n\n# Number of iterations for the benchmark\nn_iterations = 10\n\n# Benchmark without caching\nprint(\"Benchmarking without caching:\")\nstart_time = time.time()\nfor _ in range(n_iterations):\n    embedding = vectorizer.embed(text, skip_cache=True)\nno_cache_time = time.time() - start_time\nprint(f\"Time taken without caching: {no_cache_time:.4f} seconds\")\nprint(f\"Average time per embedding: {no_cache_time/n_iterations:.4f} seconds\")\n\n# Benchmark with caching\nprint(\"\\nBenchmarking with caching:\")\nstart_time = time.time()\nfor _ in range(n_iterations):\n    embedding = vectorizer.embed(text)\ncache_time = time.time() - start_time\nprint(f\"Time taken with caching: {cache_time:.4f} seconds\")\nprint(f\"Average time per embedding: {cache_time/n_iterations:.4f} seconds\")\n\n# Compare performance\nspeedup = no_cache_time / cache_time\nlatency_reduction = (no_cache_time/n_iterations) - (cache_time/n_iterations)\nprint(f\"\\nPerformance comparison:\")\nprint(f\"Speedup with caching: {speedup:.2f}x faster\")\nprint(f\"Time saved: {no_cache_time - cache_time:.4f} seconds ({(1 - cache_time/no_cache_time) * 100:.1f}%)\")\nprint(f\"Latency reduction: {latency_reduction:.4f} seconds per query\")\n```\n\n    Benchmarking without caching:\n\n\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00, 21.51it/s]\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00, 23.21it/s]\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00, 23.96it/s]\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00, 23.28it/s]\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00, 22.69it/s]\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00, 22.98it/s]\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00, 23.17it/s]\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00, 24.12it/s]\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00, 23.37it/s]\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00, 23.24it/s]\n\n\n    Time taken without caching: 0.4549 seconds\n    Average time per embedding: 0.0455 seconds\n    \n    Benchmarking with caching:\n\n\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00, 23.69it/s]\n\n\n    Time taken with caching: 0.0664 seconds\n    Average time per embedding: 0.0066 seconds\n    \n    Performance comparison:\n    Speedup with caching: 6.86x faster\n    Time saved: 0.3885 seconds (85.4%)\n    Latency reduction: 0.0389 seconds per query\n\n\n## Common Use Cases for Embedding Caching\n\nEmbedding caching is particularly useful in the following scenarios:\n\n1. **Search applications**: Cache embeddings for frequently searched queries to reduce latency\n2. **Content recommendation systems**: Cache embeddings for content items to speed up similarity calculations\n3. **API services**: Reduce costs and improve response times when generating embeddings through paid APIs\n4. **Batch processing**: Speed up processing of datasets that contain duplicate texts\n5. **Chatbots and virtual assistants**: Cache embeddings for common user queries to provide faster responses\n6. **Development** workflows\n\n## Cleanup\n\nLet's clean up our caches to avoid leaving data in Redis:\n\n\n```python\n# Clean up all caches\ncache.clear()\nttl_cache.clear()\nexample_cache.clear()\nbenchmark_cache.clear()\n```\n\n## Summary\n\nThe `EmbeddingsCache` provides an efficient way to store and retrieve embeddings with their associated text and metadata. Key features include:\n\n- Simple API for storing and retrieving individual embeddings (`set`/`get`)\n- Batch operations for working with multiple embeddings efficiently (`mset`/`mget`/`mexists`/`mdrop`)\n- Support for metadata storage alongside embeddings\n- Configurable time-to-live (TTL) for cache entries\n- Key-based operations for advanced use cases\n- Async support for use in asynchronous applications\n- Significant performance improvements (15-20x faster with batch operations)\n\nBy using the `EmbeddingsCache`, you can reduce computational costs and improve the performance of applications that rely on embeddings.\n",
  "tags": [],
  "last_updated": "2026-04-08T12:21:52-07:00"
}