{
  "id": "getting_started",
  "title": "Getting Started",
  "url": "https://redis.io/docs/latest/develop/ai/redisvl/0.17.0/user_guide/getting_started/",
  "summary": "",
  "content": "\n\nRedisVL is a Python library with an integrated CLI for building AI applications with Redis. This guide covers the core workflow:\n\n1. Defining an `IndexSchema`\n2. Preparing a sample dataset\n3. Creating a `SearchIndex`\n4. Using the `rvl` CLI\n5. Loading data into Redis\n6. Fetching and managing records\n7. Executing vector searches\n8. Updating an index\n\n## Prerequisites\n\nBefore you begin, ensure you have:\n- Installed RedisVL: `pip install redisvl`\n- A running Redis instance ([Redis 8+](https://redis.io/downloads/) or [Redis Cloud](https://redis.io/cloud))\n\n## What You'll Learn\n\nBy the end of this guide, you will be able to:\n- Create index schemas using Python dictionaries or YAML files\n- Build and manage `SearchIndex` objects\n- Use the `rvl` CLI for index management\n- Load data and execute vector similarity searches\n- Fetch individual records and list all keys in an index\n- Delete specific records by key or document ID\n- Update index schemas as your application evolves\n\n## Define an `IndexSchema`\n\nThe `IndexSchema` maintains crucial **index configuration** and **field definitions** to\nenable search with Redis. For ease of use, the schema can be constructed from a\npython dictionary or yaml file.\n\n### Example Schema Creation\nConsider a dataset with user information, including `job`, `age`, `credit_score`,\nand a 3-dimensional `user_embedding` vector.\n\nYou must also decide on a Redis index name and key prefix to use for this\ndataset. Below are example schema definitions in both YAML and Dict format.\n\n**YAML Definition:**\n\n```yaml\nversion: '0.1.0'\n\nindex:\n  name: user_simple\n  prefix: user_simple_docs\n\nfields:\n    - name: user\n      type: tag\n    - name: credit_score\n      type: tag\n    - name: job\n      type: text\n    - name: age\n      type: numeric\n    - name: user_embedding\n      type: vector\n      attrs:\n        algorithm: flat\n        dims: 3\n        distance_metric: cosine\n        datatype: float32\n```\nStore this in a local file, such as `schema.yaml`, for RedisVL usage.\n\n**Python Dictionary:**\n\n\n```python\nschema = {\n    \"index\": {\n        \"name\": \"user_simple\",\n        \"prefix\": \"user_simple_docs\",\n    },\n    \"fields\": [\n        {\"name\": \"user\", \"type\": \"tag\"},\n        {\"name\": \"credit_score\", \"type\": \"tag\"},\n        {\"name\": \"job\", \"type\": \"text\"},\n        {\"name\": \"age\", \"type\": \"numeric\"},\n        {\n            \"name\": \"user_embedding\",\n            \"type\": \"vector\",\n            \"attrs\": {\n                \"dims\": 3,\n                \"distance_metric\": \"cosine\",\n                \"algorithm\": \"flat\",\n                \"datatype\": \"float32\"\n            }\n        }\n    ]\n}\n```\n\n## Sample Dataset Preparation\n\nBelow, create a mock dataset with `user`, `job`, `age`, `credit_score`, and\n`user_embedding` fields. The `user_embedding` vectors are synthetic examples\nfor demonstration purposes.\n\nFor more information on creating real-world embeddings, refer to this\n[article](https://mlops.community/vector-similarity-search-from-basics-to-production/).\n\n\n```python\nimport numpy as np\n\n\ndata = [\n    {\n        'user': 'john',\n        'age': 1,\n        'job': 'engineer',\n        'credit_score': 'high',\n        'user_embedding': np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes()\n    },\n    {\n        'user': 'mary',\n        'age': 2,\n        'job': 'doctor',\n        'credit_score': 'low',\n        'user_embedding': np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes()\n    },\n    {\n        'user': 'joe',\n        'age': 3,\n        'job': 'dentist',\n        'credit_score': 'medium',\n        'user_embedding': np.array([0.9, 0.9, 0.1], dtype=np.float32).tobytes()\n    }\n]\n```\n\nThe `user_embedding` vectors are converted to bytes using NumPy's `.tobytes()` method.\n\n## Create a `SearchIndex`\n\nWith the schema and sample dataset ready, create a `SearchIndex`.\n\n### Bring your own Redis connection instance\n\nThis is ideal in scenarios where you have custom settings on the connection instance or if your application will share a connection pool:\n\n\n```python\nfrom redisvl.index import SearchIndex\nfrom redis import Redis\n\nclient = Redis.from_url(\"redis://localhost:6379\")\nindex = SearchIndex.from_dict(schema, redis_client=client, validate_on_load=True)\n```\n\n### Let the index manage the connection instance\n\nThis is ideal for simple cases:\n\n\n```python\nindex = SearchIndex.from_dict(schema, redis_url=\"redis://localhost:6379\", validate_on_load=True)\n\n# If you don't specify a client or Redis URL, the index will attempt to\n# connect to Redis at the default address \"redis://localhost:6379\".\n```\n\n### Create the index\n\nNow that we are connected to Redis, we need to run the create command.\n\n\n```python\nindex.create(overwrite=True)\n```\n\nNote that at this point, the index has no entries. Data loading follows.\n\n## Inspect with the `rvl` CLI\nUse the `rvl` CLI to inspect the created index and its fields:\n\n\n```python\n!rvl index info -i user_simple\n```\n\n    \n    \n    Index Information:\n    ╭──────────────────────┬──────────────────────┬──────────────────────┬──────────────────────┬──────────────────────╮\n    │ Index Name           │ Storage Type         │ Prefixes             │ Index Options        │ Indexing             │\n    ├──────────────────────┼──────────────────────┼──────────────────────┼──────────────────────┼──────────────────────┤\n    | user_simple          | HASH                 | ['user_simple_docs'] | []                   | 0                    |\n    ╰──────────────────────┴──────────────────────┴──────────────────────┴──────────────────────┴──────────────────────╯\n    Index Fields:\n    ╭─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────╮\n    │ Name            │ Attribute       │ Type            │ Field Option    │ Option Value    │ Field Option    │ Option Value    │ Field Option    │ Option Value    │ Field Option    │ Option Value    │\n    ├─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤\n    │ user            │ user            │ TAG             │ SEPARATOR       │ ,               │                 │                 │                 │                 │                 │                 │\n    │ credit_score    │ credit_score    │ TAG             │ SEPARATOR       │ ,               │                 │                 │                 │                 │                 │                 │\n    │ job             │ job             │ TEXT            │ WEIGHT          │ 1               │                 │                 │                 │                 │                 │                 │\n    │ age             │ age             │ NUMERIC         │                 │                 │                 │                 │                 │                 │                 │                 │\n    │ user_embedding  │ user_embedding  │ VECTOR          │ algorithm       │ FLAT            │ data_type       │ FLOAT32         │ dim             │ 3               │ distance_metric │ COSINE          │\n    ╰─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────╯\n\n\n## Load Data to `SearchIndex`\n\nLoad the sample dataset to Redis.\n\n### Validate data entries on load\nRedisVL uses pydantic validation under the hood to ensure loaded data is valid and confirms to your schema. This setting is optional and can be configured in the `SearchIndex` class.\n\n\n```python\nkeys = index.load(data)\n\nprint(keys)\n```\n\n    ['user_simple_docs:01KHKHQYX95EDQN18FG8FRMRQ5', 'user_simple_docs:01KHKHQYXC97WY4ACG1V01GEPC', 'user_simple_docs:01KHKHQYXC97WY4ACG1V01GEPD']\n\n\nBy default, `load` will create a unique Redis key as a combination of the index key `prefix` and a random ULID. You can also customize the key by providing direct keys or pointing to a specified `id_field` on load.\n\n### Load INVALID data\nThis will raise a `SchemaValidationError` if `validate_on_load` is set to true in the `SearchIndex` class.\n\n\n```python\n# NBVAL_SKIP\n\ntry:\n    keys = index.load([{\"user_embedding\": True}])\nexcept Exception as e:\n    print(str(e))\n```\n\n    Schema validation failed for object at index 0. Field 'user_embedding' expects bytes (vector data), but got boolean value 'True'. If this should be a vector field, provide a list of numbers or bytes. If this should be a different field type, check your schema definition.\n    Object data: {\n      \"user_embedding\": true\n    }\n    Hint: Check that your data types match the schema field definitions. Use index.schema.fields to view expected field types.\n\n\n### Upsert the index with new data\nUpsert data by using the `load` method again:\n\n\n```python\n# Add more data\nnew_data = [{\n    'user': 'tyler',\n    'age': 9,\n    'job': 'engineer',\n    'credit_score': 'high',\n    'user_embedding': np.array([0.1, 0.3, 0.5], dtype=np.float32).tobytes()\n}]\nkeys = index.load(new_data)\n\nprint(keys)\n```\n\n    ['user_simple_docs:01KHKHR37CD6143DNQ41G3ADNA']\n\n\n## Fetch and Manage Records\n\nRedisVL provides several methods to retrieve and manage individual records in your index.\n\n### Fetch a record by ID\n\nUse `fetch()` to retrieve a single record when you know its ID. The ID is the unique identifier you provided during load (via `id_field`) or the auto-generated ULID.\n\n\n```python\n# Fetch a record by its ID (e.g., the user field value if used as id_field)\n# First, let's reload data with a specific id_field\nindex.load(data, id_field=\"user\")\n\n# Now fetch by the user ID\nrecord = index.fetch(\"john\")\nprint(record)\n```\n\nYou can also construct the full Redis key from an ID using the `key()` method:\n\n\n```python\n# Get the full Redis key for a given ID\nfull_key = index.key(\"john\")\nprint(f\"Full Redis key: {full_key}\")\n```\n\n\n\u003ctable\u003e\u003ctr\u003e\u003cth\u003evector_distance\u003c/th\u003e\u003cth\u003euser\u003c/th\u003e\u003cth\u003eage\u003c/th\u003e\u003cth\u003ejob\u003c/th\u003e\u003cth\u003ecredit_score\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd\u003e0\u003c/td\u003e\u003ctd\u003ejohn\u003c/td\u003e\u003ctd\u003e1\u003c/td\u003e\u003ctd\u003eengineer\u003c/td\u003e\u003ctd\u003ehigh\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd\u003e0\u003c/td\u003e\u003ctd\u003emary\u003c/td\u003e\u003ctd\u003e2\u003c/td\u003e\u003ctd\u003edoctor\u003c/td\u003e\u003ctd\u003elow\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd\u003e0.0566298961639\u003c/td\u003e\u003ctd\u003etyler\u003c/td\u003e\u003ctd\u003e9\u003c/td\u003e\u003ctd\u003eengineer\u003c/td\u003e\u003ctd\u003ehigh\u003c/td\u003e\u003c/tr\u003e\u003c/table\u003e\n\n\n### List all keys in the index\n\nTo enumerate all keys in your index, use `paginate()` with a `FilterQuery`. This is useful for batch processing or auditing your data.\n\n\n```python\nfrom redisvl.query import FilterQuery\nfrom redisvl.query.filter import FilterExpression\n\n# Create a query that matches all documents\nquery = FilterQuery(\n    filter_expression=FilterExpression(\"*\"),\n    return_fields=[\"user\", \"age\", \"job\"]\n)\n\n# Paginate through all results\nfor batch in index.paginate(query, page_size=10):\n    for doc in batch:\n        print(f\"Key: {doc['id']}, User: {doc['user']}\")\n```\n\n### Delete specific records\n\nUse `drop_keys()` to remove specific records by their full Redis key, or `drop_documents()` to remove by document ID.\n\n\n```python\n# Delete by full Redis key\nfull_key = index.key(\"john\")\ndeleted_count = index.drop_keys(full_key)\nprint(f\"Deleted {deleted_count} record(s) by key\")\n\n# Delete multiple keys at once\n# index.drop_keys([\"key1\", \"key2\", \"key3\"])\n```\n\n\n```python\n# Delete by document ID (without the prefix)\ndeleted_count = index.drop_documents(\"mary\")\nprint(f\"Deleted {deleted_count} record(s) by document ID\")\n\n# Delete multiple documents at once\n# index.drop_documents([\"id1\", \"id2\", \"id3\"])\n```\n\n**Note:** `drop_keys()` expects the full Redis key (including prefix), while `drop_documents()` expects just the document ID.\n\n## Creating `VectorQuery` Objects\n\nNext we will create a vector query object for our newly populated index. This example will use a simple vector to demonstrate how vector similarity works. Vectors in production will likely be much larger than 3 floats and often require Machine Learning models (i.e. Huggingface sentence transformers) or an embeddings API (Cohere, OpenAI). `redisvl` provides a set of [Vectorizers](https://redis.io/docs/latest/vectorizers#openai) to assist in vector creation.\n\n\n```python\nfrom redisvl.query import VectorQuery\nfrom jupyterutils import result_print\n\nquery = VectorQuery(\n    vector=[0.1, 0.1, 0.5],\n    vector_field_name=\"user_embedding\",\n    return_fields=[\"user\", \"age\", \"job\", \"credit_score\", \"vector_distance\"],\n    num_results=3\n)\n```\n\n\n\u003ctable\u003e\u003ctr\u003e\u003cth\u003evector_distance\u003c/th\u003e\u003cth\u003euser\u003c/th\u003e\u003cth\u003eage\u003c/th\u003e\u003cth\u003ejob\u003c/th\u003e\u003cth\u003ecredit_score\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd\u003e0\u003c/td\u003e\u003ctd\u003ejohn\u003c/td\u003e\u003ctd\u003e1\u003c/td\u003e\u003ctd\u003eengineer\u003c/td\u003e\u003ctd\u003ehigh\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd\u003e0\u003c/td\u003e\u003ctd\u003emary\u003c/td\u003e\u003ctd\u003e2\u003c/td\u003e\u003ctd\u003edoctor\u003c/td\u003e\u003ctd\u003elow\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd\u003e0.0566298961639\u003c/td\u003e\u003ctd\u003etyler\u003c/td\u003e\u003ctd\u003e9\u003c/td\u003e\u003ctd\u003eengineer\u003c/td\u003e\u003ctd\u003ehigh\u003c/td\u003e\u003c/tr\u003e\u003c/table\u003e\n\n\n**Note:** For HNSW and SVS-VAMANA indexes, you can tune search performance using runtime parameters:\n\n```python\n# Example with HNSW runtime parameters\nquery = VectorQuery(\n    vector=[0.1, 0.1, 0.5],\n    vector_field_name=\"user_embedding\",\n    return_fields=[\"user\", \"age\", \"job\"],\n    num_results=3,\n    ef_runtime=50  # Higher for better recall (HNSW only)\n)\n```\n\nSee the [SVS-VAMANA guide](09_svs_vamana.ipynb) and [Advanced Queries guide](11_advanced_queries.ipynb) for more details on runtime parameters.\n\n### Executing queries\nWith our `VectorQuery` object defined above, we can execute the query over the `SearchIndex` using the `query` method.\n\n\n```python\nresults = index.query(query)\nresult_print(results)\n```\n\n## Using an Asynchronous Redis Client\n\nThe `AsyncSearchIndex` class along with an async Redis python client allows for queries, index creation, and data loading to be done asynchronously. This is the\nrecommended route for working with `redisvl` in production-like settings.\n\n\n```python\nfrom redisvl.index import AsyncSearchIndex\nfrom redis.asyncio import Redis\n\nclient = Redis.from_url(\"redis://localhost:6379\")\nindex = AsyncSearchIndex.from_dict(schema, redis_client=client)\n```\n\n\n\n\n    4\n\n\n\n\n```python\n# execute the vector query async\nresults = await index.query(query)\nresult_print(results)\n```\n\n\n\n\n    True\n\n\n\n## Updating a schema\nIn some scenarios, it makes sense to update the index schema. With Redis and `redisvl`, this is easy because Redis can keep the underlying data in place while you change or make updates to the index configuration.\n\nSo for our scenario, let's imagine we want to reindex this data in 2 ways:\n- by using a `Tag` type for `job` field instead of `Text`\n- by using an `hnsw` vector index for the `user_embedding` field instead of a `flat` vector index\n\n\n```python\n# Modify this schema to have what we want\nindex.schema.remove_field(\"job\")\nindex.schema.remove_field(\"user_embedding\")\nindex.schema.add_fields([\n    {\"name\": \"job\", \"type\": \"tag\"},\n    {\n        \"name\": \"user_embedding\",\n        \"type\": \"vector\",\n        \"attrs\": {\n            \"dims\": 3,\n            \"distance_metric\": \"cosine\",\n            \"algorithm\": \"hnsw\",\n            \"datatype\": \"float32\"\n        }\n    }\n])\n```\n\n\n```python\n# Run the index update but keep underlying data in place\nawait index.create(overwrite=True, drop=False)\n```\n\n\n```python\n# Execute the vector query async\nresults = await index.query(query)\nresult_print(results)\n```\n\n## Check Index Stats\nUse the `rvl` CLI to check the stats for the index:\n\n\n```python\n!rvl stats -i user_simple\n```\n\n## Next Steps\n\nNow that you understand the basics of RedisVL, explore these related guides:\n\n- [Query and Filter Data](02_complex_filtering.ipynb) - Learn advanced filtering with tag, numeric, text, and geo filters\n- [Create Embeddings with Vectorizers](04_vectorizers.ipynb) - Generate embeddings using OpenAI, HuggingFace, Cohere, and more\n- [Choose a Storage Type](05_hash_vs_json.ipynb) - Understand when to use Hash vs JSON storage\n\n## Cleanup\n\nUse `.clear()` to flush all data from Redis associated with the index while leaving the index in place for future insertions.\n\nUse `.delete()` to remove both the index and the underlying data.\n\n\n```python\n# Clear all data from Redis associated with the index\nawait index.clear()\n```\n\n\n```python\n# But the index is still in place\nawait index.exists()\n```\n\n\n```python\n# Remove / delete the index in its entirety\nawait index.delete()\n```\n",
  "tags": [],
  "last_updated": "2026-04-21T14:39:33+02:00"
}
