Advanced Query Types
In this notebook, we will explore advanced query types available in RedisVL:
TextQuery: Full text search with advanced scoringAggregateHybridQuery: Combines text and vector search for hybrid retrievalMultiVectorQuery: Search over multiple vector fields simultaneously
These query types are powerful tools for building sophisticated search applications that go beyond simple vector similarity search.
Prerequisites:
- Ensure
redisvlis installed in your Python environment. - Have a running instance of Redis Stack or Redis Cloud.
Setup and Data Preparation
First, let's create a schema and prepare sample data that includes text fields, numeric fields, and vector fields.
import numpy as np
from jupyterutils import result_print
# Sample data with text descriptions, categories, and vectors
data = [
{
'product_id': 'prod_1',
'brief_description': 'comfortable running shoes for athletes',
'full_description': 'Engineered with a dual-layer EVA foam midsole and FlexWeave breathable mesh upper, these running shoes deliver responsive cushioning for long-distance runs. The anatomical footbed adapts to your stride while the carbon rubber outsole provides superior traction on varied terrain.',
'category': 'footwear',
'price': 89.99,
'rating': 4.5,
'text_embedding': np.array([0.1, 0.2, 0.1], dtype=np.float32).tobytes(),
'image_embedding': np.array([0.8, 0.1], dtype=np.float32).tobytes(),
},
{
'product_id': 'prod_2',
'brief_description': 'lightweight running jacket with water resistance',
'full_description': 'Stay protected with this ultralight 2.5-layer DWR-coated shell featuring laser-cut ventilation zones and reflective piping for low-light visibility. Packs into its own chest pocket and weighs just 4.2 oz, making it ideal for unpredictable weather conditions.',
'category': 'outerwear',
'price': 129.99,
'rating': 4.8,
'text_embedding': np.array([0.2, 0.3, 0.2], dtype=np.float32).tobytes(),
'image_embedding': np.array([0.7, 0.2], dtype=np.float32).tobytes(),
},
{
'product_id': 'prod_3',
'brief_description': 'professional tennis racket for competitive players',
'full_description': 'Competition-grade racket featuring a 98 sq in head size, 16x19 string pattern, and aerospace-grade graphite frame that delivers explosive power with pinpoint control. Tournament-approved specs include 315g weight and 68 RA stiffness rating for advanced baseline play.',
'category': 'equipment',
'price': 199.99,
'rating': 4.9,
'text_embedding': np.array([0.9, 0.1, 0.05], dtype=np.float32).tobytes(),
'image_embedding': np.array([0.1, 0.9], dtype=np.float32).tobytes(),
},
{
'product_id': 'prod_4',
'brief_description': 'yoga mat with extra cushioning for comfort',
'full_description': 'Premium 8mm thick TPE yoga mat with dual-texture surface - smooth side for hot yoga flow and textured side for maximum grip during balancing poses. Closed-cell technology prevents moisture absorption while alignment markers guide proper positioning in asanas.',
'category': 'accessories',
'price': 39.99,
'rating': 4.3,
'text_embedding': np.array([0.15, 0.25, 0.15], dtype=np.float32).tobytes(),
'image_embedding': np.array([0.5, 0.5], dtype=np.float32).tobytes(),
},
{
'product_id': 'prod_5',
'brief_description': 'basketball shoes with excellent ankle support',
'full_description': 'High-top basketball sneakers with Zoom Air units in forefoot and heel, reinforced lateral sidewalls for explosive cuts, and herringbone traction pattern optimized for hardwood courts. The internal bootie construction and extended ankle collar provide lockdown support during aggressive drives.',
'category': 'footwear',
'price': 139.99,
'rating': 4.7,
'text_embedding': np.array([0.12, 0.18, 0.12], dtype=np.float32).tobytes(),
'image_embedding': np.array([0.75, 0.15], dtype=np.float32).tobytes(),
},
{
'product_id': 'prod_6',
'brief_description': 'swimming goggles with anti-fog coating',
'full_description': 'Low-profile competition goggles with curved polycarbonate lenses offering 180-degree peripheral vision and UV protection. Hydrophobic anti-fog coating lasts 10x longer than standard treatments, while the split silicone strap and interchangeable nose bridges ensure a watertight, custom fit.',
'category': 'accessories',
'price': 24.99,
'rating': 4.4,
'text_embedding': np.array([0.3, 0.1, 0.2], dtype=np.float32).tobytes(),
'image_embedding': np.array([0.2, 0.8], dtype=np.float32).tobytes(),
},
]
Define the Schema
Our schema includes:
- Tag fields:
product_id,category - Text fields:
brief_descriptionandfull_descriptionfor full-text search - Numeric fields:
price,rating - Vector fields:
text_embedding(3 dimensions) andimage_embedding(2 dimensions) for semantic search
schema = {
"index": {
"name": "advanced_queries",
"prefix": "products",
"storage_type": "hash",
},
"fields": [
{"name": "product_id", "type": "tag"},
{"name": "category", "type": "tag"},
{"name": "brief_description", "type": "text"},
{"name": "full_description", "type": "text"},
{"name": "price", "type": "numeric"},
{"name": "rating", "type": "numeric"},
{
"name": "text_embedding",
"type": "vector",
"attrs": {
"dims": 3,
"distance_metric": "cosine",
"algorithm": "flat",
"datatype": "float32"
}
},
{
"name": "image_embedding",
"type": "vector",
"attrs": {
"dims": 2,
"distance_metric": "cosine",
"algorithm": "flat",
"datatype": "float32"
}
}
],
}
Create Index and Load Data
from redisvl.index import SearchIndex
# Create the search index
index = SearchIndex.from_dict(schema, redis_url="redis://localhost:6379")
# Create the index and load data
index.create(overwrite=True)
keys = index.load(data)
print(f"Loaded {len(keys)} products into the index")
Loaded 6 products into the index
1. TextQuery: Full Text Search
The TextQuery class enables full text search with advanced scoring algorithms. It's ideal for keyword-based search with relevance ranking.
Basic Text Search
Let's search for products related to "running shoes":
from redisvl.query import TextQuery
# Create a text query
text_query = TextQuery(
text="running shoes",
text_field_name="brief_description",
return_fields=["product_id", "brief_description", "category", "price"],
num_results=5
)
results = index.query(text_query)
result_print(results)
| score | product_id | brief_description | category | price |
|---|---|---|---|---|
| 5.953989333038773 | prod_1 | comfortable running shoes for athletes | footwear | 89.99 |
| 2.085315593627535 | prod_5 | basketball shoes with excellent ankle support | footwear | 139.99 |
| 2.0410082774474088 | prod_2 | lightweight running jacket with water resistance | outerwear | 129.99 |
Text Search with Different Scoring Algorithms
RedisVL supports multiple text scoring algorithms. Let's compare BM25STD and TFIDF:
# BM25 standard scoring (default)
bm25_query = TextQuery(
text="comfortable shoes",
text_field_name="brief_description",
text_scorer="BM25STD",
return_fields=["product_id", "brief_description", "price"],
num_results=3
)
print("Results with BM25 scoring:")
results = index.query(bm25_query)
result_print(results)
Results with BM25 scoring:
| score | product_id | brief_description | price |
|---|---|---|---|
| 6.031534703977659 | prod_1 | comfortable running shoes for athletes | 89.99 |
| 2.085315593627535 | prod_5 | basketball shoes with excellent ankle support | 139.99 |
| 1.5268074873573214 | prod_4 | yoga mat with extra cushioning for comfort | 39.99 |
# TFIDF scoring
tfidf_query = TextQuery(
text="comfortable shoes",
text_field_name="brief_description",
text_scorer="TFIDF",
return_fields=["product_id", "brief_description", "price"],
num_results=3
)
print("Results with TFIDF scoring:")
results = index.query(tfidf_query)
result_print(results)
Results with TFIDF scoring:
| score | product_id | brief_description | price |
|---|---|---|---|
| 2.3333333333333335 | prod_1 | comfortable running shoes for athletes | 89.99 |
| 2.0 | prod_5 | basketball shoes with excellent ankle support | 139.99 |
| 1.0 | prod_4 | yoga mat with extra cushioning for comfort | 39.99 |
Text Search with Filters
Combine text search with filters to narrow results:
from redisvl.query.filter import Tag, Num
# Search for "shoes" only in the footwear category
filtered_text_query = TextQuery(
text="shoes",
text_field_name="brief_description",
filter_expression=Tag("category") == "footwear",
return_fields=["product_id", "brief_description", "category", "price"],
num_results=5
)
results = index.query(filtered_text_query)
result_print(results)
| score | product_id | brief_description | category | price |
|---|---|---|---|---|
| 3.9314935770863046 | prod_1 | comfortable running shoes for athletes | footwear | 89.99 |
| 3.1279733904413027 | prod_5 | basketball shoes with excellent ankle support | footwear | 139.99 |
# Search for products under $100
price_filtered_query = TextQuery(
text="comfortable",
text_field_name="brief_description",
filter_expression=Num("price") < 100,
return_fields=["product_id", "brief_description", "price"],
num_results=5
)
results = index.query(price_filtered_query)
result_print(results)
| score | product_id | brief_description | price |
|---|---|---|---|
| 3.1541404034996914 | prod_1 | comfortable running shoes for athletes | 89.99 |
| 1.5268074873573214 | prod_4 | yoga mat with extra cushioning for comfort | 39.99 |
Text Search with Multiple Fields and Weights
You can search across multiple text fields with different weights to prioritize certain fields.
Here we'll prioritize the brief_description field and make text similarity in that field twice as important as text similarity in full_description:
weighted_query = TextQuery(
text="shoes",
text_field_name={"brief_description": 1.0, "full_description": 0.5},
return_fields=["product_id", "brief_description"],
num_results=3
)
results = index.query(weighted_query)
result_print(results)
| score | product_id | brief_description |
|---|---|---|
| 5.035440025836444 | prod_1 | comfortable running shoes for athletes |
| 2.085315593627535 | prod_5 | basketball shoes with excellent ankle support |
Text Search with Custom Stopwords
Stopwords are common words that are filtered out before processing the query. You can specify which language's default stopwords should be filtered out, like english, french, or german. You can also define your own list of stopwords:
# Use English stopwords (default)
query_with_stopwords = TextQuery(
text="the best shoes for running",
text_field_name="brief_description",
stopwords="english", # Common words like "the", "for" will be removed
return_fields=["product_id", "brief_description"],
num_results=3
)
results = index.query(query_with_stopwords)
result_print(results)
| score | product_id | brief_description |
|---|---|---|
| 5.953989333038773 | prod_1 | comfortable running shoes for athletes |
| 2.085315593627535 | prod_5 | basketball shoes with excellent ankle support |
| 2.0410082774474088 | prod_2 | lightweight running jacket with water resistance |
# Use custom stopwords
custom_stopwords_query = TextQuery(
text="professional equipment for athletes",
text_field_name="brief_description",
stopwords=["for", "with"], # Only these words will be filtered
return_fields=["product_id", "brief_description"],
num_results=3
)
results = index.query(custom_stopwords_query)
result_print(results)
| score | product_id | brief_description |
|---|---|---|
| 3.1541404034996914 | prod_1 | comfortable running shoes for athletes |
| 3.0864038416103 | prod_3 | professional tennis racket for competitive players |
# No stopwords
no_stopwords_query = TextQuery(
text="the best shoes for running",
text_field_name="brief_description",
stopwords=None, # All words will be included
return_fields=["product_id", "brief_description"],
num_results=3
)
results = index.query(no_stopwords_query)
result_print(results)
| score | product_id | brief_description |
|---|---|---|
| 5.953989333038773 | prod_1 | comfortable running shoes for athletes |
| 2.085315593627535 | prod_5 | basketball shoes with excellent ankle support |
| 2.0410082774474088 | prod_2 | lightweight running jacket with water resistance |
2. AggregateHybridQuery: Combining Text and Vector Search
The AggregateHybridQuery combines text search and vector similarity to provide the best of both worlds:
- Text search: Finds exact keyword matches
- Vector search: Captures semantic similarity
Results are scored using a weighted combination:
hybrid_score = (alpha) * vector_score + (1 - alpha) * text_score
Where alpha controls the balance between vector and text search (default: 0.7).
Basic Aggregate Hybrid Query
Let's search for "running" with both text and semantic search:
from redisvl.query import AggregateHybridQuery
# Create a hybrid query
hybrid_query = AggregateHybridQuery(
text="running shoes",
text_field_name="brief_description",
vector=[0.1, 0.2, 0.1], # Query vector
vector_field_name="text_embedding",
return_fields=["product_id", "brief_description", "category", "price"],
num_results=5
)
results = index.query(hybrid_query)
result_print(results)
| vector_distance | product_id | brief_description | category | price | vector_similarity | text_score | hybrid_score |
|---|---|---|---|---|---|---|---|
| 5.96046447754e-08 | prod_1 | comfortable running shoes for athletes | footwear | 89.99 | 0.999999970198 | 5.95398933304 | 2.48619677905 |
| 0.00985252857208 | prod_5 | basketball shoes with excellent ankle support | footwear | 139.99 | 0.995073735714 | 2.08531559363 | 1.32214629309 |
| 0.00985252857208 | prod_2 | lightweight running jacket with water resistance | outerwear | 129.99 | 0.995073735714 | 2.04100827745 | 1.30885409823 |
| 0.0038834810257 | prod_4 | yoga mat with extra cushioning for comfort | accessories | 39.99 | 0.998058259487 | 0 | 0.698640781641 |
| 0.236237406731 | prod_6 | swimming goggles with anti-fog coating | accessories | 24.99 | 0.881881296635 | 0 | 0.617316907644 |
Adjusting the Alpha Parameter
The alpha parameter controls the weight between vector and text search:
alpha=1.0: Pure vector searchalpha=0.0: Pure text searchalpha=0.7(default): 70% vector, 30% text
# More emphasis on vector search (alpha=0.9)
vector_heavy_query = AggregateHybridQuery(
text="comfortable",
text_field_name="brief_description",
vector=[0.15, 0.25, 0.15],
vector_field_name="text_embedding",
alpha=0.9, # 90% vector, 10% text
return_fields=["product_id", "brief_description"],
num_results=3
)
print("Results with alpha=0.9 (vector-heavy):")
results = index.query(vector_heavy_query)
result_print(results)
Results with alpha=0.9 (vector-heavy):
| vector_distance | product_id | brief_description | vector_similarity | text_score | hybrid_score |
|---|---|---|---|---|---|
| -1.19209289551e-07 | prod_4 | yoga mat with extra cushioning for comfort | 1.0000000596 | 1.52680748736 | 1.05268080238 |
| 0.00136888027191 | prod_5 | basketball shoes with excellent ankle support | 0.999315559864 | 0 | 0.899384003878 |
| 0.00136888027191 | prod_2 | lightweight running jacket with water resistance | 0.999315559864 | 0 | 0.899384003878 |
Aggregate Hybrid Query with Filters
You can also combine hybrid search with filters:
# Hybrid search with a price filter
filtered_hybrid_query = AggregateHybridQuery(
text="professional equipment",
text_field_name="brief_description",
vector=[0.9, 0.1, 0.05],
vector_field_name="text_embedding",
filter_expression=Num("price") > 100,
return_fields=["product_id", "brief_description", "category", "price"],
num_results=5
)
results = index.query(filtered_hybrid_query)
result_print(results)
| vector_distance | product_id | brief_description | category | price | vector_similarity | text_score | hybrid_score |
|---|---|---|---|---|---|---|---|
| -1.19209289551e-07 | prod_3 | professional tennis racket for competitive players | equipment | 199.99 | 1.0000000596 | 3.08640384161 | 1.62592119421 |
| 0.411657452583 | prod_5 | basketball shoes with excellent ankle support | footwear | 139.99 | 0.794171273708 | 0 | 0.555919891596 |
| 0.411657452583 | prod_2 | lightweight running jacket with water resistance | outerwear | 129.99 | 0.794171273708 | 0 | 0.555919891596 |
Using Different Text Scorers
AggregateHybridQuery supports the same text scoring algorithms as TextQuery:
# Aggregate Hybrid query with TFIDF scorer
hybrid_tfidf = AggregateHybridQuery(
text="shoes support",
text_field_name="brief_description",
vector=[0.12, 0.18, 0.12],
vector_field_name="text_embedding",
text_scorer="TFIDF",
return_fields=["product_id", "brief_description"],
num_results=3
)
results = index.query(hybrid_tfidf)
result_print(results)
| vector_distance | product_id | brief_description | vector_similarity | text_score | hybrid_score |
|---|---|---|---|---|---|
| 0 | prod_5 | basketball shoes with excellent ankle support | 1 | 5 | 2.2 |
| 0 | prod_2 | lightweight running jacket with water resistance | 1 | 0 | 0.7 |
| 0.00136888027191 | prod_4 | yoga mat with extra cushioning for comfort | 0.999315559864 | 0 | 0.699520891905 |
3. MultiVectorQuery: Multi-Vector Search
The MultiVectorQuery allows you to search over multiple vector fields simultaneously. This is useful when you have different types of embeddings (e.g., text and image embeddings) and want to find results that match across multiple modalities.
The final score is calculated as a weighted combination:
combined_score = w_1 * score_1 + w_2 * score_2 + w_3 * score_3 + ...
Basic Multi-Vector Query
First, we need to import the Vector class to define our query vectors:
from redisvl.query import MultiVectorQuery, Vector
# Define multiple vectors for the query
text_vector = Vector(
vector=[0.1, 0.2, 0.1],
field_name="text_embedding",
dtype="float32",
weight=0.7 # 70% weight for text embedding
)
image_vector = Vector(
vector=[0.8, 0.1],
field_name="image_embedding",
dtype="float32",
weight=0.3 # 30% weight for image embedding
)
# Create a multi-vector query
multi_vector_query = MultiVectorQuery(
vectors=[text_vector, image_vector],
return_fields=["product_id", "brief_description", "category"],
num_results=5
)
results = index.query(multi_vector_query)
result_print(results)
| distance_0 | distance_1 | product_id | brief_description | category | score_0 | score_1 | combined_score |
|---|---|---|---|---|---|---|---|
| 5.96046447754e-08 | 5.96046447754e-08 | prod_1 | comfortable running shoes for athletes | footwear | 0.999999970198 | 0.999999970198 | 0.999999970198 |
| 0.00985252857208 | 0.00266629457474 | prod_5 | basketball shoes with excellent ankle support | footwear | 0.995073735714 | 0.998666852713 | 0.996151670814 |
| 0.00985252857208 | 0.0118260979652 | prod_2 | lightweight running jacket with water resistance | outerwear | 0.995073735714 | 0.994086951017 | 0.994777700305 |
| 0.0038834810257 | 0.210647821426 | prod_4 | yoga mat with extra cushioning for comfort | accessories | 0.998058259487 | 0.894676089287 | 0.967043608427 |
| 0.236237406731 | 0.639005899429 | prod_6 | swimming goggles with anti-fog coating | accessories | 0.881881296635 | 0.680497050285 | 0.82146602273 |
Adjusting Vector Weights
You can adjust the weights to prioritize different vector fields:
# More emphasis on image similarity
text_vec = Vector(
vector=[0.9, 0.1, 0.05],
field_name="text_embedding",
dtype="float32",
weight=0.2 # 20% weight
)
image_vec = Vector(
vector=[0.1, 0.9],
field_name="image_embedding",
dtype="float32",
weight=0.8 # 80% weight
)
image_heavy_query = MultiVectorQuery(
vectors=[text_vec, image_vec],
return_fields=["product_id", "brief_description", "category"],
num_results=3
)
print("Results with emphasis on image similarity:")
results = index.query(image_heavy_query)
result_print(results)
Results with emphasis on image similarity:
| distance_0 | distance_1 | product_id | brief_description | category | score_0 | score_1 | combined_score |
|---|---|---|---|---|---|---|---|
| -1.19209289551e-07 | 0 | prod_3 | professional tennis racket for competitive players | equipment | 1.0000000596 | 1 | 1.00000001192 |
| 0.14539372921 | 0.00900757312775 | prod_6 | swimming goggles with anti-fog coating | accessories | 0.927303135395 | 0.995496213436 | 0.981857597828 |
| 0.436696171761 | 0.219131231308 | prod_4 | yoga mat with extra cushioning for comfort | accessories | 0.78165191412 | 0.890434384346 | 0.868677890301 |
Multi-Vector Query with Filters
Combine multi-vector search with filters to narrow results:
# Multi-vector search with category filter
text_vec = Vector(
vector=[0.1, 0.2, 0.1],
field_name="text_embedding",
dtype="float32",
weight=0.6
)
image_vec = Vector(
vector=[0.8, 0.1],
field_name="image_embedding",
dtype="float32",
weight=0.4
)
filtered_multi_query = MultiVectorQuery(
vectors=[text_vec, image_vec],
filter_expression=Tag("category") == "footwear",
return_fields=["product_id", "brief_description", "category", "price"],
num_results=5
)
results = index.query(filtered_multi_query)
result_print(results)
| distance_0 | distance_1 | product_id | brief_description | category | price | score_0 | score_1 | combined_score |
|---|---|---|---|---|---|---|---|---|
| 5.96046447754e-08 | 5.96046447754e-08 | prod_1 | comfortable running shoes for athletes | footwear | 89.99 | 0.999999970198 | 0.999999970198 | 0.999999970198 |
| 0.00985252857208 | 0.00266629457474 | prod_5 | basketball shoes with excellent ankle support | footwear | 139.99 | 0.995073735714 | 0.998666852713 | 0.996510982513 |
Comparing Query Types
Let's compare the three query types side by side:
# TextQuery - keyword-based search
text_q = TextQuery(
text="shoes",
text_field_name="brief_description",
return_fields=["product_id", "brief_description"],
num_results=3
)
print("TextQuery Results (keyword-based):")
result_print(index.query(text_q))
print()
TextQuery Results (keyword-based):
| score | product_id | brief_description |
|---|---|---|
| 2.8773943004779676 | prod_1 | comfortable running shoes for athletes |
| 2.085315593627535 | prod_5 | basketball shoes with excellent ankle support |
# AggregateHybridQuery - combines text and vector search
hybrid_q = AggregateHybridQuery(
text="shoes",
text_field_name="brief_description",
vector=[0.1, 0.2, 0.1],
vector_field_name="text_embedding",
return_fields=["product_id", "brief_description"],
num_results=3
)
print("AggregateHybridQuery Results (text + vector):")
result_print(index.query(hybrid_q))
print()
AggregateHybridQuery Results (text + vector):
| vector_distance | product_id | brief_description | vector_similarity | text_score | hybrid_score |
|---|---|---|---|---|---|
| 5.96046447754e-08 | prod_1 | comfortable running shoes for athletes | 0.999999970198 | 2.87739430048 | 1.56321826928 |
| 0.0038834810257 | prod_4 | yoga mat with extra cushioning for comfort | 0.998058259487 | 0 | 0.698640781641 |
| 0.00985252857208 | prod_2 | lightweight running jacket with water resistance | 0.995073735714 | 0 | 0.696551615 |
# MultiVectorQuery - searches multiple vector fields
mv_text = Vector(
vector=[0.1, 0.2, 0.1],
field_name="text_embedding",
dtype="float32",
weight=0.5
)
mv_image = Vector(
vector=[0.8, 0.1],
field_name="image_embedding",
dtype="float32",
weight=0.5
)
multi_q = MultiVectorQuery(
vectors=[mv_text, mv_image],
return_fields=["product_id", "brief_description"],
num_results=3
)
print("MultiVectorQuery Results (multiple vectors):")
result_print(index.query(multi_q))
MultiVectorQuery Results (multiple vectors):
| distance_0 | distance_1 | product_id | brief_description | score_0 | score_1 | combined_score |
|---|---|---|---|---|---|---|
| 5.96046447754e-08 | 5.96046447754e-08 | prod_1 | comfortable running shoes for athletes | 0.999999970198 | 0.999999970198 | 0.999999970198 |
| 0.00985252857208 | 0.00266629457474 | prod_5 | basketball shoes with excellent ankle support | 0.995073735714 | 0.998666852713 | 0.996870294213 |
| 0.00985252857208 | 0.0118260979652 | prod_2 | lightweight running jacket with water resistance | 0.995073735714 | 0.994086951017 | 0.994580343366 |
Best Practices
When to Use Each Query Type:
-
TextQuery:- When you need precise keyword matching
- For traditional search engine functionality
- When text relevance scoring is important
- Example: Product search, document retrieval
-
AggregateHybridQuery:- When you want to combine keyword and semantic search
- For improved search quality over pure text or vector search
- When you have both text and vector representations of your data
- Example: E-commerce search, content recommendation
-
MultiVectorQuery:- When you have multiple types of embeddings (text, image, audio, etc.)
- For multi-modal search applications
- When you want to balance multiple semantic signals
- Example: Image-text search, cross-modal retrieval
# Cleanup
index.delete()