{
  "id": "hash_vs_json",
  "title": "Hash vs JSON Storage",
  "url": "https://redis.io/docs/latest/develop/ai/redisvl/0.9.0/user_guide/hash_vs_json/",
  "summary": "",
  "content": "\n\n\nOut of the box, Redis provides a [variety of data structures](https://redis.com/redis-enterprise/data-structures/) that can adapt to your domain specific applications and use cases.\nIn this notebook, we will demonstrate how to use RedisVL with both [Hash](https://redis.io/docs/data-types/hashes/) and [JSON](https://redis.io/docs/data-types/json/) data.\n\n\nBefore running this notebook, be sure to\n1. Have installed ``redisvl`` and have that environment active for this notebook.\n2. Have a running Redis Stack or Redis Software instance with RediSearch \u003e 2.4 activated.\n\nFor example, you can run [Redis Stack](https://redis.io/docs/install/install-stack/) locally with Docker:\n\n```bash\ndocker run -d -p 6379:6379 -p 8001:8001 redis/redis-stack:latest\n```\n\nOr create a [FREE Redis Cloud](https://redis.io/cloud).\n\n\n```python\n# import necessary modules\nimport pickle\n\nfrom redisvl.redis.utils import buffer_to_array\nfrom redisvl.index import SearchIndex\n\n\n# load in the example data and printing utils\ndata = pickle.load(open(\"hybrid_example_data.pkl\", \"rb\"))\n```\n\n\n```python\nfrom jupyterutils import result_print, table_print\n\ntable_print(data)\n```\n\n\n\u003ctable\u003e\u003ctr\u003e\u003cth\u003euser\u003c/th\u003e\u003cth\u003eage\u003c/th\u003e\u003cth\u003ejob\u003c/th\u003e\u003cth\u003ecredit_score\u003c/th\u003e\u003cth\u003eoffice_location\u003c/th\u003e\u003cth\u003euser_embedding\u003c/th\u003e\u003cth\u003elast_updated\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd\u003ejohn\u003c/td\u003e\u003ctd\u003e18\u003c/td\u003e\u003ctd\u003eengineer\u003c/td\u003e\u003ctd\u003ehigh\u003c/td\u003e\u003ctd\u003e-122.4194,37.7749\u003c/td\u003e\u003ctd\u003eb'\\xcd\\xcc\\xcc=\\xcd\\xcc\\xcc=\\x00\\x00\\x00?'\u003c/td\u003e\u003ctd\u003e1741627789\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd\u003ederrick\u003c/td\u003e\u003ctd\u003e14\u003c/td\u003e\u003ctd\u003edoctor\u003c/td\u003e\u003ctd\u003elow\u003c/td\u003e\u003ctd\u003e-122.4194,37.7749\u003c/td\u003e\u003ctd\u003eb'\\xcd\\xcc\\xcc=\\xcd\\xcc\\xcc=\\x00\\x00\\x00?'\u003c/td\u003e\u003ctd\u003e1741627789\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd\u003enancy\u003c/td\u003e\u003ctd\u003e94\u003c/td\u003e\u003ctd\u003edoctor\u003c/td\u003e\u003ctd\u003ehigh\u003c/td\u003e\u003ctd\u003e-122.4194,37.7749\u003c/td\u003e\u003ctd\u003eb'333?\\xcd\\xcc\\xcc=\\x00\\x00\\x00?'\u003c/td\u003e\u003ctd\u003e1710696589\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd\u003etyler\u003c/td\u003e\u003ctd\u003e100\u003c/td\u003e\u003ctd\u003eengineer\u003c/td\u003e\u003ctd\u003ehigh\u003c/td\u003e\u003ctd\u003e-122.0839,37.3861\u003c/td\u003e\u003ctd\u003eb'\\xcd\\xcc\\xcc=\\xcd\\xcc\\xcc\u003e\\x00\\x00\\x00?'\u003c/td\u003e\u003ctd\u003e1742232589\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd\u003etim\u003c/td\u003e\u003ctd\u003e12\u003c/td\u003e\u003ctd\u003edermatologist\u003c/td\u003e\u003ctd\u003ehigh\u003c/td\u003e\u003ctd\u003e-122.0839,37.3861\u003c/td\u003e\u003ctd\u003eb'\\xcd\\xcc\\xcc\u003e\\xcd\\xcc\\xcc\u003e\\x00\\x00\\x00?'\u003c/td\u003e\u003ctd\u003e1739644189\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd\u003etaimur\u003c/td\u003e\u003ctd\u003e15\u003c/td\u003e\u003ctd\u003eCEO\u003c/td\u003e\u003ctd\u003elow\u003c/td\u003e\u003ctd\u003e-122.0839,37.3861\u003c/td\u003e\u003ctd\u003eb'\\x9a\\x99\\x19?\\xcd\\xcc\\xcc=\\x00\\x00\\x00?'\u003c/td\u003e\u003ctd\u003e1742232589\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd\u003ejoe\u003c/td\u003e\u003ctd\u003e35\u003c/td\u003e\u003ctd\u003edentist\u003c/td\u003e\u003ctd\u003emedium\u003c/td\u003e\u003ctd\u003e-122.0839,37.3861\u003c/td\u003e\u003ctd\u003eb'fff?fff?\\xcd\\xcc\\xcc='\u003c/td\u003e\u003ctd\u003e1742232589\u003c/td\u003e\u003c/tr\u003e\u003c/table\u003e\n\n\n## Hash or JSON -- how to choose?\nBoth storage options offer a variety of features and tradeoffs. Below we will work through a dummy dataset to learn when and how to use both.\n\n### Working with Hashes\nHashes in Redis are simple collections of field-value pairs. Think of it like a mutable single-level dictionary contains multiple \"rows\":\n\n\n```python\n{\n    \"model\": \"Deimos\",\n    \"brand\": \"Ergonom\",\n    \"type\": \"Enduro bikes\",\n    \"price\": 4972,\n}\n```\n\nHashes are best suited for use cases with the following characteristics:\n- Performance (speed) and storage space (memory consumption) are top concerns\n- Data can be easily normalized and modeled as a single-level dict\n\nHashes are typically the default recommendation.\n\n\n```python\n# define the hash index schema\nhash_schema = {\n    \"index\": {\n        \"name\": \"user-hash\",\n        \"prefix\": \"user-hash-docs\",\n        \"storage_type\": \"hash\", # default setting -- HASH\n    },\n    \"fields\": [\n        {\"name\": \"user\", \"type\": \"tag\"},\n        {\"name\": \"credit_score\", \"type\": \"tag\"},\n        {\"name\": \"job\", \"type\": \"text\"},\n        {\"name\": \"age\", \"type\": \"numeric\"},\n        {\"name\": \"office_location\", \"type\": \"geo\"},\n        {\n            \"name\": \"user_embedding\",\n            \"type\": \"vector\",\n            \"attrs\": {\n                \"dims\": 3,\n                \"distance_metric\": \"cosine\",\n                \"algorithm\": \"flat\",\n                \"datatype\": \"float32\"\n            }\n\n        }\n    ],\n}\n```\n\n\n```python\n# construct a search index from the hash schema\nhindex = SearchIndex.from_dict(hash_schema, redis_url=\"redis://localhost:6379\")\n\n# create the index (no data yet)\nhindex.create(overwrite=True)\n```\n\n\n```python\n# show the underlying storage type\nhindex.storage_type\n```\n\n\n\n\n    \u003cStorageType.HASH: 'hash'\u003e\n\n\n\n#### Vectors as byte strings\nOne nuance when working with Hashes in Redis, is that all vectorized data must be passed as a byte string (for efficient storage, indexing, and processing). An example of that can be seen below:\n\n\n```python\n# show a single entry from the data that will be loaded\ndata[0]\n```\n\n\n\n\n    {'user': 'john',\n     'age': 18,\n     'job': 'engineer',\n     'credit_score': 'high',\n     'office_location': '-122.4194,37.7749',\n     'user_embedding': b'\\xcd\\xcc\\xcc=\\xcd\\xcc\\xcc=\\x00\\x00\\x00?',\n     'last_updated': 1741627789}\n\n\n\n\n```python\n# load hash data\nkeys = hindex.load(data)\n```\n\n\n```python\n!rvl stats -i user-hash\n```\n\n    \n    Statistics:\n    ╭─────────────────────────────┬────────────╮\n    │ Stat Key                    │ Value      │\n    ├─────────────────────────────┼────────────┤\n    │ num_docs                    │ 7          │\n    │ num_terms                   │ 6          │\n    │ max_doc_id                  │ 7          │\n    │ num_records                 │ 44         │\n    │ percent_indexed             │ 1          │\n    │ hash_indexing_failures      │ 0          │\n    │ number_of_uses              │ 1          │\n    │ bytes_per_record_avg        │ 40.2954559 │\n    │ doc_table_size_mb           │ 7.27653503 │\n    │ inverted_sz_mb              │ 0.00169086 │\n    │ key_table_size_mb           │ 2.48908996 │\n    │ offset_bits_per_record_avg  │ 8          │\n    │ offset_vectors_sz_mb        │ 8.58306884 │\n    │ offsets_per_term_avg        │ 0.20454545 │\n    │ records_per_doc_avg         │ 6.28571414 │\n    │ sortable_values_size_mb     │ 0          │\n    │ total_indexing_time         │ 0.25799998 │\n    │ total_inverted_index_blocks │ 18         │\n    │ vector_index_sz_mb          │ 0.02023315 │\n    ╰─────────────────────────────┴────────────╯\n\n\n#### Performing Queries\nOnce our index is created and data is loaded into the right format, we can run queries against the index with RedisVL:\n\n\n```python\nfrom redisvl.query import VectorQuery\nfrom redisvl.query.filter import Tag, Text, Num\n\nt = (Tag(\"credit_score\") == \"high\") \u0026 (Text(\"job\") % \"enginee*\") \u0026 (Num(\"age\") \u003e 17)  # codespell:ignore enginee\n\nv = VectorQuery(\n    vector=[0.1, 0.1, 0.5],\n    vector_field_name=\"user_embedding\",\n    return_fields=[\"user\", \"credit_score\", \"age\", \"job\", \"office_location\"],\n    filter_expression=t\n)\n\n\nresults = hindex.query(v)\nresult_print(results)\n\n```\n\n\n\u003ctable\u003e\u003ctr\u003e\u003cth\u003evector_distance\u003c/th\u003e\u003cth\u003euser\u003c/th\u003e\u003cth\u003ecredit_score\u003c/th\u003e\u003cth\u003eage\u003c/th\u003e\u003cth\u003ejob\u003c/th\u003e\u003cth\u003eoffice_location\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd\u003e0\u003c/td\u003e\u003ctd\u003ejohn\u003c/td\u003e\u003ctd\u003ehigh\u003c/td\u003e\u003ctd\u003e18\u003c/td\u003e\u003ctd\u003eengineer\u003c/td\u003e\u003ctd\u003e-122.4194,37.7749\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd\u003e0.109129190445\u003c/td\u003e\u003ctd\u003etyler\u003c/td\u003e\u003ctd\u003ehigh\u003c/td\u003e\u003ctd\u003e100\u003c/td\u003e\u003ctd\u003eengineer\u003c/td\u003e\u003ctd\u003e-122.0839,37.3861\u003c/td\u003e\u003c/tr\u003e\u003c/table\u003e\n\n\n\n```python\n# clean up\nhindex.delete()\n\n```\n\n### Working with JSON\n\nJSON is best suited for use cases with the following characteristics:\n- Ease of use and data model flexibility are top concerns\n- Application data is already native JSON\n- Replacing another document storage/db solution\n\n\n```python\n# define the json index schema\njson_schema = {\n    \"index\": {\n        \"name\": \"user-json\",\n        \"prefix\": \"user-json-docs\",\n        \"storage_type\": \"json\", # JSON storage type\n    },\n    \"fields\": [\n        {\"name\": \"user\", \"type\": \"tag\"},\n        {\"name\": \"credit_score\", \"type\": \"tag\"},\n        {\"name\": \"job\", \"type\": \"text\"},\n        {\"name\": \"age\", \"type\": \"numeric\"},\n        {\"name\": \"office_location\", \"type\": \"geo\"},\n        {\n            \"name\": \"user_embedding\",\n            \"type\": \"vector\",\n            \"attrs\": {\n                \"dims\": 3,\n                \"distance_metric\": \"cosine\",\n                \"algorithm\": \"flat\",\n                \"datatype\": \"float32\"\n            }\n\n        }\n    ],\n}\n```\n\n\n```python\n# construct a search index from the json schema\njindex = SearchIndex.from_dict(json_schema, redis_url=\"redis://localhost:6379\")\n\n# create the index (no data yet)\njindex.create(overwrite=True)\n```\n\n\n```python\n# note the multiple indices in the same database\n!rvl index listall\n```\n\n    13:02:56 [RedisVL] INFO   Indices:\n    13:02:56 [RedisVL] INFO   1. user-json\n\n\n#### Vectors as float arrays\nVectorized data stored in JSON must be stored as a pure array (python list) of floats. We will modify our sample data to account for this below:\n\n\n```python\njson_data = data.copy()\n\nfor d in json_data:\n    d['user_embedding'] = buffer_to_array(d['user_embedding'], dtype='float32')\n```\n\n\n```python\n# inspect a single JSON record\njson_data[0]\n```\n\n\n\n\n    {'user': 'john',\n     'age': 18,\n     'job': 'engineer',\n     'credit_score': 'high',\n     'office_location': '-122.4194,37.7749',\n     'user_embedding': [0.10000000149011612, 0.10000000149011612, 0.5],\n     'last_updated': 1741627789}\n\n\n\n\n```python\nkeys = jindex.load(json_data)\n```\n\n\n```python\n# we can now run the exact same query as above\nresult_print(jindex.query(v))\n```\n\n\n\u003ctable\u003e\u003ctr\u003e\u003cth\u003evector_distance\u003c/th\u003e\u003cth\u003euser\u003c/th\u003e\u003cth\u003ecredit_score\u003c/th\u003e\u003cth\u003eage\u003c/th\u003e\u003cth\u003ejob\u003c/th\u003e\u003cth\u003eoffice_location\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd\u003e0\u003c/td\u003e\u003ctd\u003ejohn\u003c/td\u003e\u003ctd\u003ehigh\u003c/td\u003e\u003ctd\u003e18\u003c/td\u003e\u003ctd\u003eengineer\u003c/td\u003e\u003ctd\u003e-122.4194,37.7749\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd\u003e0.109129190445\u003c/td\u003e\u003ctd\u003etyler\u003c/td\u003e\u003ctd\u003ehigh\u003c/td\u003e\u003ctd\u003e100\u003c/td\u003e\u003ctd\u003eengineer\u003c/td\u003e\u003ctd\u003e-122.0839,37.3861\u003c/td\u003e\u003c/tr\u003e\u003c/table\u003e\n\n\n## Cleanup\n\n\n```python\njindex.delete()\n```\n\n# Working with nested data in JSON\n\nRedis also supports native **JSON** objects. These can be multi-level (nested) objects, with full JSONPath support for updating/retrieving sub elements:\n\n```json\n{\n    \"name\": \"Specialized Stump jumper\",\n    \"metadata\": {\n        \"model\": \"Stumpjumper\",\n        \"brand\": \"Specialized\",\n        \"type\": \"Enduro bikes\",\n        \"price\": 3000\n    },\n}\n```\n\n#### Full JSON Path support\nBecause Redis enables full JSON path support, when creating an index schema, elements need to be indexed and selected by their path with the desired `name` AND `path` that points to where the data is located within the objects.\n\nBy default, RedisVL will assume the path as `$.{name}` if not provided in JSON fields schema. If nested provide path as `$.object.attribute`\n\n### As an example:\n\n\n```python\nfrom redisvl.utils.vectorize import HFTextVectorizer\n\nemb_model = HFTextVectorizer()\n\nbike_data = [\n    {\n        \"name\": \"Specialized Stump jumper\",\n        \"metadata\": {\n            \"model\": \"Stumpjumper\",\n            \"brand\": \"Specialized\",\n            \"type\": \"Enduro bikes\",\n            \"price\": 3000\n        },\n        \"description\": \"The Specialized Stumpjumper is a versatile enduro bike that dominates both climbs and descents. Features a FACT 11m carbon fiber frame, FOX FLOAT suspension with 160mm travel, and SRAM X01 Eagle drivetrain. The asymmetric frame design and internal storage compartment make it a practical choice for all-day adventures.\"\n    },\n    {\n        \"name\": \"bike_2\",\n        \"metadata\": {\n            \"model\": \"Slash\",\n            \"brand\": \"Trek\",\n            \"type\": \"Enduro bikes\",\n            \"price\": 5000\n        },\n        \"description\": \"Trek's Slash is built for aggressive enduro riding and racing. Featuring Trek's Alpha Aluminum frame with RE:aktiv suspension technology, 160mm travel, and Knock Block frame protection. Equipped with Bontrager components and a Shimano XT drivetrain, this bike excels on technical trails and enduro race courses.\"\n    }\n]\n\nbike_data = [{**d, \"bike_embedding\": emb_model.embed(d[\"description\"])} for d in bike_data]\n\nbike_schema = {\n    \"index\": {\n        \"name\": \"bike-json\",\n        \"prefix\": \"bike-json\",\n        \"storage_type\": \"json\", # JSON storage type\n    },\n    \"fields\": [\n        {\n            \"name\": \"model\",\n            \"type\": \"tag\",\n            \"path\": \"$.metadata.model\" # note the '$'\n        },\n        {\n            \"name\": \"brand\",\n            \"type\": \"tag\",\n            \"path\": \"$.metadata.brand\"\n        },\n        {\n            \"name\": \"price\",\n            \"type\": \"numeric\",\n            \"path\": \"$.metadata.price\"\n        },\n        {\n            \"name\": \"bike_embedding\",\n            \"type\": \"vector\",\n            \"attrs\": {\n                \"dims\": len(bike_data[0][\"bike_embedding\"]),\n                \"distance_metric\": \"cosine\",\n                \"algorithm\": \"flat\",\n                \"datatype\": \"float32\"\n            }\n\n        }\n    ],\n}\n```\n\n    /Users/tyler.hutcherson/Documents/AppliedAI/redis-vl-python/.venv/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n      from .autonotebook import tqdm as notebook_tqdm\n\n\n    13:02:58 sentence_transformers.SentenceTransformer INFO   Use pytorch device_name: mps\n    13:02:58 sentence_transformers.SentenceTransformer INFO   Load pretrained SentenceTransformer: sentence-transformers/all-mpnet-base-v2\n\n\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00,  7.23it/s]\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00, 12.93it/s]\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00, 14.10it/s]\n\n\n\n```python\n# construct a search index from the json schema\nbike_index = SearchIndex.from_dict(bike_schema, redis_url=\"redis://localhost:6379\")\n\n# create the index (no data yet)\nbike_index.create(overwrite=True)\n```\n\n\n```python\nbike_index.load(bike_data)\n```\n\n\n\n\n    ['bike-json:01JY4J9M48CXF7F4Y6HRGEMT9B',\n     'bike-json:01JY4J9M48RRY6F80HR82CVZ5G']\n\n\n\n\n```python\nfrom redisvl.query import VectorQuery\n\nvec = emb_model.embed(\"I'd like a bike for aggressive riding\")\n\nv = VectorQuery(\n    vector=vec,\n    vector_field_name=\"bike_embedding\",\n    return_fields=[\n        \"brand\",\n        \"name\",\n        \"$.metadata.type\"\n    ]\n)\n\n\nresults = bike_index.query(v)\n```\n\n    Batches: 100%|██████████| 1/1 [00:00\u003c00:00, 11.72it/s]\n\n\n**Note:** As shown in the example if you want to retrieve a field from json object that was not indexed you will also need to supply the full path as with `$.metadata.type`.\n\n\n```python\nresults\n```\n\n\n\n\n    [{'id': 'bike-json:01JY4J9M48RRY6F80HR82CVZ5G',\n      'vector_distance': '0.519989132881',\n      'brand': 'Trek',\n      '$.metadata.type': 'Enduro bikes'},\n     {'id': 'bike-json:01JY4J9M48CXF7F4Y6HRGEMT9B',\n      'vector_distance': '0.657624304295',\n      'brand': 'Specialized',\n      '$.metadata.type': 'Enduro bikes'}]\n\n\n\n# Cleanup\n\n\n```python\nbike_index.delete()\n```\n",
  "tags": [],
  "last_updated": "2026-04-01T08:10:08-05:00"
}