{
  "id": "search-and-indexing",
  "title": "Search \u0026 Indexing",
  "url": "https://redis.io/docs/latest/develop/ai/redisvl/0.17.0/concepts/search-and-indexing/",
  "summary": "",
  "content": "\n\nVector search in Redis works differently from traditional databases. Understanding the underlying model helps you design better schemas and write more effective queries.\n\n## How Redis Indexes Work\n\nRedis search indexes are secondary structures that sit alongside your data. When you create an index, you’re telling Redis: \"Watch all keys that match this prefix, and build a searchable structure from these specific fields.\"\n\nThe index doesn’t store your data—it references it. Your documents live as Redis Hash or JSON objects, and the index maintains pointers and optimized structures for fast lookup. When you write a document, Redis automatically updates the index. When you delete a document, the index entry is removed.\n\nThis design means searches are fast (the index is optimized for queries) while writes remain efficient (only the affected index entries are updated). It also means you can have multiple indexes over the same data with different field configurations.\n\n## Field Types and Their Purpose\n\nEach field type serves a different search use case.\n\n**Text fields** enable full-text search. Redis tokenizes the content, applies stemming (so \"running\" matches \"run\"), and builds an inverted index. Text search finds documents containing specific words or phrases, ranked by relevance.\n\n**Tag fields** are for exact-match filtering. Unlike text fields, tags are not tokenized or stemmed. A tag value is treated as an atomic unit. This is ideal for categories, statuses, IDs, and other discrete values where you want exact matches.\n\n**Numeric fields** support range queries and sorting. You can filter for values greater than, less than, or between bounds. Numeric fields are also used for sorting results.\n\n**Geo fields** enable location-based queries. You can find documents within a radius of a point or within a bounding box.\n\n**Vector fields** enable similarity search. Each document has an embedding vector, and queries find documents whose vectors are closest to a query vector. This is the foundation of semantic search.\n\n## Vector Indexing Algorithms\n\nVector similarity search requires specialized data structures. Redis offers three algorithms, each with different trade-offs.\n\n**Flat indexing** performs exact nearest-neighbor search by comparing the query vector against every indexed vector. This guarantees finding the true closest matches, but search time grows linearly with dataset size. Use flat indexing for small datasets (under ~100K vectors) where exact results matter.\n\n**HNSW (Hierarchical Navigable Small World)** is an approximate algorithm that builds a multi-layer graph structure. Queries navigate this graph to find approximate nearest neighbors in logarithmic time. HNSW typically achieves 95-99% recall (meaning it finds 95-99% of the true nearest neighbors) while being orders of magnitude faster than flat search on large datasets. This is the default choice for most applications.\n\n**SVS (Scalable Vector Search)** is designed for very large datasets with memory constraints. It supports vector compression techniques that reduce memory footprint at the cost of some recall. SVS is useful when you have millions of vectors and memory is a limiting factor.\n\nThe algorithm choice is made at index creation and cannot be changed without rebuilding the index.\n\n## Distance Metrics\n\nWhen comparing vectors, you need a way to measure how \"close\" two vectors are. Redis supports three distance metrics.\n\n**Cosine distance** measures the angle between vectors, ignoring their magnitude. Two vectors pointing in the same direction have distance 0; opposite directions have distance 2. Cosine is widely used because most embedding models produce vectors where direction encodes meaning and magnitude is less important. Similarity equals 1 minus distance.\n\n**Euclidean distance (L2)** measures the straight-line distance between vector endpoints. Unlike cosine, it considers magnitude. Euclidean distance ranges from 0 to infinity.\n\n**Inner product (IP)** is the dot product of two vectors. It combines both direction and magnitude. When vectors are normalized (magnitude 1), inner product equals cosine similarity. Inner product can be negative and ranges from negative infinity to positive infinity.\n\nChoose your metric based on how your embedding model was trained. Most text embedding models use cosine.\n\n## Storage: Hash vs JSON\n\nRedis offers two storage formats for documents.\n\n**Hash storage** is a flat key-value structure where each field is a top-level key. It’s simple, fast, and works well when your documents don’t have nested structures. Field names in your schema map directly to hash field names.\n\n**JSON storage** supports nested documents. You can store complex objects and use JSONPath expressions to index nested fields. This is useful when your data is naturally hierarchical or when you want to store the original document structure without flattening it.\n\nThe choice affects how you structure data and how you reference fields in schemas. Hash is simpler; JSON is more flexible.\n\n## Index Lifecycle\n\nIndexes have a straightforward lifecycle: create, use, and eventually delete.\n\n**Creating an index** registers the schema with Redis. The `create()` method sends the schema definition to Redis, which builds the necessary data structures. If an index with the same name exists, you can choose to overwrite it (optionally dropping existing data) or raise an error.\n\n**Checking existence** with `exists()` tells you whether an index is registered in Redis. This is useful before creating (to avoid errors) or before querying (to ensure the index is ready).\n\n**Connecting to existing indexes** is possible with `from_existing()`. If an index was created elsewhere—by another application, a previous deployment, or the CLI—you can connect to it by name. RedisVL fetches the index metadata from Redis and reconstructs the schema automatically.\n\n```python\n# Connect to an index that already exists in Redis\nindex = SearchIndex.from_existing(\"my-index\", redis_url=\"redis://localhost:6379\")\n```\n\n**Deleting an index** with `delete()` removes the index definition from Redis. By default, this also deletes all documents associated with the index. Pass `drop=False` to keep the documents while removing only the index structure.\n\n**Clearing data** with `clear()` removes all documents from the index without deleting the index itself. The schema remains intact, ready for new data.\n\n## Data Validation\n\nRedisVL can validate data against your schema before loading it to Redis. This catches type mismatches, missing required fields, and invalid values early—before they cause problems in production.\n\nEnable validation by setting `validate_on_load=True` when creating the index:\n\n```python\nindex = SearchIndex.from_dict(schema, redis_url=\"redis://localhost:6379\", validate_on_load=True)\n```\n\nWhen validation is enabled, each object is checked against the schema during `load()`. If validation fails, a `SchemaValidationError` is raised with details about which field failed and why.\n\nValidation adds overhead, so it’s typically enabled during development and testing, then disabled in production once you’re confident in your data pipeline.\n\n## Schema Evolution\n\nRedis doesn’t support modifying an existing index schema. Once an index is created, its field definitions are fixed.\n\nTo change a schema, you create a new index with the updated configuration, reindex your data into it, update your application to use the new index, and then delete the old index. This pattern—create new, migrate, switch, drop old—is the standard approach for schema changes in production.\n\nPlanning your schema carefully upfront reduces the need for migrations, but the capability exists when requirements evolve.\n\n---\n\n**Related concepts:** [Field Attributes](field-attributes.md) explains how to configure field options like `sortable` and `index_missing`. [Query Types](queries.md) covers the different query types available.\n\n**Learn more:** [Getting Started](../user_guide/01_getting_started.md) walks through building your first index. [Choose a Storage Type](../user_guide/05_hash_vs_json.md) compares storage options in depth. [Query and Filter Data](../user_guide/02_complex_filtering.md) covers query composition.\n",
  "tags": [],
  "last_updated": "2026-04-21T14:39:33+02:00"
}
