Query and filter
Query and filter with RedisVL
In this document, you will explore more complex queries that can be performed with RedisVL.
Before beginning, be sure of the following:
- You have installed RedisVL and have that environment activated.
- You have a running Redis instance with the Redis Query Engine capability.
The sample binary data is in this file on GitHub.
import pickle
from jupyterutils import table_print, result_print
# load in the example data and printing utils
data = pickle.load(open("hybrid_example_data.pkl", "rb"))
table_print(data)
user | age | job | credit_score | office_location | user_embedding |
---|
john | 18 | engineer | high | -122.4194,37.7749 | b'\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?' |
derrick | 14 | doctor | low | -122.4194,37.7749 | b'\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?' |
nancy | 94 | doctor | high | -122.4194,37.7749 | b'333?\xcd\xcc\xcc=\x00\x00\x00?' |
tyler | 100 | engineer | high | -122.0839,37.3861 | b'\xcd\xcc\xcc=\xcd\xcc\xcc>\x00\x00\x00?' |
tim | 12 | dermatologist | high | -122.0839,37.3861 | b'\xcd\xcc\xcc>\xcd\xcc\xcc>\x00\x00\x00?' |
taimur | 15 | CEO | low | -122.0839,37.3861 | b'\x9a\x99\x19?\xcd\xcc\xcc=\x00\x00\x00?' |
joe | 35 | dentist | medium | -122.0839,37.3861 | b'fff?fff?\xcd\xcc\xcc=' |
schema = {
"index": {
"name": "user_queries",
"prefix": "user_queries_docs",
"storage_type": "hash", # default setting -- HASH
},
"fields": [
{"name": "user", "type": "tag"},
{"name": "credit_score", "type": "tag"},
{"name": "job", "type": "text"},
{"name": "age", "type": "numeric"},
{"name": "office_location", "type": "geo"},
{
"name": "user_embedding",
"type": "vector",
"attrs": {
"dims": 3,
"distance_metric": "cosine",
"algorithm": "flat",
"datatype": "float32"
}
}
],
}
from redisvl.index import SearchIndex
# construct a search index from the schema
index = SearchIndex.from_dict(schema)
# connect to local redis instance
index.connect("redis://localhost:6379")
# create the index (no data yet)
index.create(overwrite=True)
# inspect the newly-created index
$ rvl index listall
18:26:34 [RedisVL] INFO Indices:
18:26:34 [RedisVL] INFO 1. user_queries
Hybrid queries
Hybrid queries are queries that combine multiple types of filters. For example, you may want to search for a user that is a certain age, has a certain job, and is within a certain distance of a location. This is a hybrid query that combines numeric, tag, and geographic filters.
Tag filters
Tag filters are filters that are applied to tag fields. These are fields that are not tokenized and are used to store a single categorical value.
from redisvl.query import VectorQuery
from redisvl.query.filter import Tag
t = Tag("credit_score") == "high"
v = VectorQuery([0.1, 0.1, 0.5],
"user_embedding",
return_fields=["user", "credit_score", "age", "job", "office_location"],
filter_expression=t)
results = index.query(v)
result_print(results)
vector_distance | user | credit_score | age | job | office_location |
---|
0 | john | high | 18 | engineer | -122.4194,37.7749 |
0.109129190445 | tyler | high | 100 | engineer | -122.0839,37.3861 |
0.158808946609 | tim | high | 12 | dermatologist | -122.0839,37.3861 |
0.266666650772 | nancy | high | 94 | doctor | -122.4194,37.7749 |
# negation
t = Tag("credit_score") != "high"
v.set_filter(t)
result_print(index.query(v))
vector_distance | user | credit_score | age | job | office_location |
---|
0 | derrick | low | 14 | doctor | -122.4194,37.7749 |
0.217882037163 | taimur | low | 15 | CEO | -122.0839,37.3861 |
0.653301358223 | joe | medium | 35 | dentist | -122.0839,37.3861 |
# use multiple tags as a list
t = Tag("credit_score") == ["high", "medium"]
v.set_filter(t)
result_print(index.query(v))
vector_distance | user | credit_score | age | job | office_location |
---|
0 | john | high | 18 | engineer | -122.4194,37.7749 |
0.109129190445 | tyler | high | 100 | engineer | -122.0839,37.3861 |
0.158808946609 | tim | high | 12 | dermatologist | -122.0839,37.3861 |
0.266666650772 | nancy | high | 94 | doctor | -122.4194,37.7749 |
0.653301358223 | joe | medium | 35 | dentist | -122.0839,37.3861 |
# use multiple tags as a set (to enforce uniqueness)
t = Tag("credit_score") == set(["high", "high", "medium"])
v.set_filter(t)
result_print(index.query(v))
vector_distance | user | credit_score | age | job | office_location |
---|
0 | john | high | 18 | engineer | -122.4194,37.7749 |
0.109129190445 | tyler | high | 100 | engineer | -122.0839,37.3861 |
0.158808946609 | tim | high | 12 | dermatologist | -122.0839,37.3861 |
0.266666650772 | nancy | high | 94 | doctor | -122.4194,37.7749 |
0.653301358223 | joe | medium | 35 | dentist | -122.0839,37.3861 |
What about scenarios where you might want to dynamically generate a list of tags? RedisVL allows you to do this gracefully without having to check for the empty case. The empty case is when you attempt to run a tag filter on a field with no defined values to match. For example:
Tag("credit_score") == []
An empty filter like the one above will yield a *
Redis query filter that implies the base case: no filter.
# gracefully fallback to "*" filter if empty case
empty_case = Tag("credit_score") == []
v.set_filter(empty_case)
result_print(index.query(v))
vector_distance | user | credit_score | age | job | office_location |
---|
0 | john | high | 18 | engineer | -122.4194,37.7749 |
0 | derrick | low | 14 | doctor | -122.4194,37.7749 |
0.109129190445 | tyler | high | 100 | engineer | -122.0839,37.3861 |
0.158808946609 | tim | high | 12 | dermatologist | -122.0839,37.3861 |
0.217882037163 | taimur | low | 15 | CEO | -122.0839,37.3861 |
0.266666650772 | nancy | high | 94 | doctor | -122.4194,37.7749 |
0.653301358223 | joe | medium | 35 | dentist | -122.0839,37.3861 |
Numeric filters
Numeric filters are filters that are applied to numeric fields and can be used to isolate a range of values for a given field.
from redisvl.query.filter import Num
numeric_filter = Num("age") > 15
v.set_filter(numeric_filter)
result_print(index.query(v))
vector_distance | user | credit_score | age | job | office_location |
---|
0 | john | high | 18 | engineer | -122.4194,37.7749 |
0.109129190445 | tyler | high | 100 | engineer | -122.0839,37.3861 |
0.266666650772 | nancy | high | 94 | doctor | -122.4194,37.7749 |
0.653301358223 | joe | medium | 35 | dentist | -122.0839,37.3861 |
# exact match query
numeric_filter = Num("age") == 14
v.set_filter(numeric_filter)
result_print(index.query(v))
vector_distance | user | credit_score | age | job | office_location |
---|
0 | derrick | low | 14 | doctor | -122.4194,37.7749 |
# negation
numeric_filter = Num("age") != 14
v.set_filter(numeric_filter)
result_print(index.query(v))
vector_distance | user | credit_score | age | job | office_location |
---|
0 | john | high | 18 | engineer | -122.4194,37.7749 |
0.109129190445 | tyler | high | 100 | engineer | -122.0839,37.3861 |
0.158808946609 | tim | high | 12 | dermatologist | -122.0839,37.3861 |
0.217882037163 | taimur | low | 15 | CEO | -122.0839,37.3861 |
0.266666650772 | nancy | high | 94 | doctor | -122.4194,37.7749 |
0.653301358223 | joe | medium | 35 | dentist | -122.0839,37.3861 |
Text filters
Text filters are filters that are applied to text fields. These filters are applied to the entire text field. For example, if you have a text field that contains the text "The quick brown fox jumps over the lazy dog", a text filter of "quick" will match this text field.
from redisvl.query.filter import Text
# exact match filter -- document must contain the exact word doctor
text_filter = Text("job") == "doctor"
v.set_filter(text_filter)
result_print(index.query(v))
vector_distance | user | credit_score | age | job | office_location |
---|
0 | derrick | low | 14 | doctor | -122.4194,37.7749 |
0.266666650772 | nancy | high | 94 | doctor | -122.4194,37.7749 |
# negation -- document must not contain the exact word doctor
negate_text_filter = Text("job") != "doctor"
v.set_filter(negate_text_filter)
result_print(index.query(v))
vector_distance | user | credit_score | age | job | office_location |
---|
0 | john | high | 18 | engineer | -122.4194,37.7749 |
0.109129190445 | tyler | high | 100 | engineer | -122.0839,37.3861 |
0.158808946609 | tim | high | 12 | dermatologist | -122.0839,37.3861 |
0.217882037163 | taimur | low | 15 | CEO | -122.0839,37.3861 |
0.653301358223 | joe | medium | 35 | dentist | -122.0839,37.3861 |
# wildcard match filter
wildcard_filter = Text("job") % "doct*"
v.set_filter(wildcard_filter)
result_print(index.query(v))
vector_distance | user | credit_score | age | job | office_location |
---|
0 | derrick | low | 14 | doctor | -122.4194,37.7749 |
0.266666650772 | nancy | high | 94 | doctor | -122.4194,37.7749 |
# fuzzy match filter
fuzzy_match = Text("job") % "%%engine%%"
v.set_filter(fuzzy_match)
result_print(index.query(v))
vector_distance | user | credit_score | age | job | office_location |
---|
0 | john | high | 18 | engineer | -122.4194,37.7749 |
0.109129190445 | tyler | high | 100 | engineer | -122.0839,37.3861 |
# conditional -- match documents with job field containing engineer OR doctor
conditional = Text("job") % "engineer|doctor"
v.set_filter(conditional)
result_print(index.query(v))
vector_distance | user | credit_score | age | job | office_location |
---|
0 | john | high | 18 | engineer | -122.4194,37.7749 |
0 | derrick | low | 14 | doctor | -122.4194,37.7749 |
0.109129190445 | tyler | high | 100 | engineer | -122.0839,37.3861 |
0.266666650772 | nancy | high | 94 | doctor | -122.4194,37.7749 |
# gracefully fallback to "*" filter if empty case
empty_case = Text("job") % ""
v.set_filter(empty_case)
result_print(index.query(v))
vector_distance | user | credit_score | age | job | office_location |
---|
0 | john | high | 18 | engineer | -122.4194,37.7749 |
0 | derrick | low | 14 | doctor | -122.4194,37.7749 |
0.109129190445 | tyler | high | 100 | engineer | -122.0839,37.3861 |
0.158808946609 | tim | high | 12 | dermatologist | -122.0839,37.3861 |
0.217882037163 | taimur | low | 15 | CEO | -122.0839,37.3861 |
0.266666650772 | nancy | high | 94 | doctor | -122.4194,37.7749 |
0.653301358223 | joe | medium | 35 | dentist | -122.0839,37.3861 |
Geographic filters
Geographic filters are filters that are applied to geographic fields. These filters are used to find results that are within a certain distance of a given point. The distance is specified in kilometers, miles, meters, or feet. A radius can also be specified to find results within a certain radius of a given point.
from redisvl.query.filter import Geo, GeoRadius
# within 10 km of San Francisco office
geo_filter = Geo("office_location") == GeoRadius(-122.4194, 37.7749, 10, "km")
v.set_filter(geo_filter)
result_print(index.query(v))
vector_distance | user | credit_score | age | job | office_location |
---|
0 | john | high | 18 | engineer | -122.4194,37.7749 |
0 | derrick | low | 14 | doctor | -122.4194,37.7749 |
0.266666650772 | nancy | high | 94 | doctor | -122.4194,37.7749 |
# within 100 km Radius of San Francisco office
geo_filter = Geo("office_location") == GeoRadius(-122.4194, 37.7749, 100, "km")
v.set_filter(geo_filter)
result_print(index.query(v))
vector_distance | user | credit_score | age | job | office_location |
---|
0 | john | high | 18 | engineer | -122.4194,37.7749 |
0 | derrick | low | 14 | doctor | -122.4194,37.7749 |
0.109129190445 | tyler | high | 100 | engineer | -122.0839,37.3861 |
0.158808946609 | tim | high | 12 | dermatologist | -122.0839,37.3861 |
0.217882037163 | taimur | low | 15 | CEO | -122.0839,37.3861 |
0.266666650772 | nancy | high | 94 | doctor | -122.4194,37.7749 |
0.653301358223 | joe | medium | 35 | dentist | -122.0839,37.3861 |
# not within 10 km Radius of San Francisco office
geo_filter = Geo("office_location") != GeoRadius(-122.4194, 37.7749, 10, "km")
v.set_filter(geo_filter)
result_print(index.query(v))
vector_distance | user | credit_score | age | job | office_location |
---|
0.109129190445 | tyler | high | 100 | engineer | -122.0839,37.3861 |
0.158808946609 | tim | high | 12 | dermatologist | -122.0839,37.3861 |
0.217882037163 | taimur | low | 15 | CEO | -122.0839,37.3861 |
0.653301358223 | joe | medium | 35 | dentist | -122.0839,37.3861 |
Combining Filters
In this example, you will combine a numeric filter with a tag filter, and search for users that are between the ages of 20 and 30 and have a job title of "engineer".
Intersection ("and")
t = Tag("credit_score") == "high"
low = Num("age") >= 18
high = Num("age") <= 100
combined = t & low & high
v = VectorQuery([0.1, 0.1, 0.5],
"user_embedding",
return_fields=["user", "credit_score", "age", "job", "office_location"],
filter_expression=combined)
result_print(index.query(v))
vector_distance | user | credit_score | age | job | office_location |
---|
0 | john | high | 18 | engineer | -122.4194,37.7749 |
0.109129190445 | tyler | high | 100 | engineer | -122.0839,37.3861 |
0.266666650772 | nancy | high | 94 | doctor | -122.4194,37.7749 |
Union ("or")
The union of two queries is the set of all results that are returned by either of the two queries. The union of two queries is performed using the |
operator.
low = Num("age") < 18
high = Num("age") > 93
combined = low | high
v.set_filter(combined)
result_print(index.query(v))
vector_distance | user | credit_score | age | job | office_location |
---|
0 | derrick | low | 14 | doctor | -122.4194,37.7749 |
0.109129190445 | tyler | high | 100 | engineer | -122.0839,37.3861 |
0.158808946609 | tim | high | 12 | dermatologist | -122.0839,37.3861 |
0.217882037163 | taimur | low | 15 | CEO | -122.0839,37.3861 |
0.266666650772 | nancy | high | 94 | doctor | -122.4194,37.7749 |
Dynamic combination
There are often situations where you may or may not want to use a filter in a
given query. As shown above, filters will except the None
type and revert
to a wildcard filter that returns all results.
The same goes for filter combinations, which enable rapid reuse of filters in
requests with different parameters as shown below. This removes the need for
a number of "if-then" conditionals to test for the empty case.
def make_filter(age=None, credit=None, job=None):
flexible_filter = (
(Num("age") > age) &
(Tag("credit_score") == credit) &
(Text("job") % job)
)
return flexible_filter
# all parameters
combined = make_filter(age=18, credit="high", job="engineer")
v.set_filter(combined)
result_print(index.query(v))
vector_distance | user | credit_score | age | job | office_location |
---|
0.109129190445 | tyler | high | 100 | engineer | -122.0839,37.3861 |
# just age and credit_score
combined = make_filter(age=18, credit="high")
v.set_filter(combined)
result_print(index.query(v))
vector_distance | user | credit_score | age | job | office_location |
---|
0.109129190445 | tyler | high | 100 | engineer | -122.0839,37.3861 |
0.266666650772 | nancy | high | 94 | doctor | -122.4194,37.7749 |
# just age
combined = make_filter(age=18)
v.set_filter(combined)
result_print(index.query(v))
vector_distance | user | credit_score | age | job | office_location |
---|
0.109129190445 | tyler | high | 100 | engineer | -122.0839,37.3861 |
0.266666650772 | nancy | high | 94 | doctor | -122.4194,37.7749 |
0.653301358223 | joe | medium | 35 | dentist | -122.0839,37.3861 |
# no filters
combined = make_filter()
v.set_filter(combined)
result_print(index.query(v))
vector_distance | user | credit_score | age | job | office_location |
---|
0 | john | high | 18 | engineer | -122.4194,37.7749 |
0 | derrick | low | 14 | doctor | -122.4194,37.7749 |
0.109129190445 | tyler | high | 100 | engineer | -122.0839,37.3861 |
0.158808946609 | tim | high | 12 | dermatologist | -122.0839,37.3861 |
0.217882037163 | taimur | low | 15 | CEO | -122.0839,37.3861 |
0.266666650772 | nancy | high | 94 | doctor | -122.4194,37.7749 |
0.653301358223 | joe | medium | 35 | dentist | -122.0839,37.3861 |
Filter queries
In some cases, you may not want to run a vector query, but just use a FilterExpression
similar to a SQL query. The FilterQuery
class enable this functionality. It is similar to the VectorQuery
class but soley takes a FilterExpression
.
from redisvl.query import FilterQuery
has_low_credit = Tag("credit_score") == "low"
filter_query = FilterQuery(
return_fields=["user", "credit_score", "age", "job", "location"],
filter_expression=has_low_credit
)
results = index.query(filter_query)
result_print(results)
user | credit_score | age | job |
---|
derrick | low | 14 | doctor |
taimur | low | 15 | CEO |
Count queries
In some cases, you may need to use a FilterExpression
to execute a CountQuery
that simply returns the count of the number of entities in the pertaining set. It is similar to the FilterQuery
class but does not return the values of the underlying data.
from redisvl.query import CountQuery
has_low_credit = Tag("credit_score") == "low"
filter_query = CountQuery(filter_expression=has_low_credit)
count = index.query(filter_query)
print(f"{count} records match the filter expression {str(has_low_credit)} for the given index.")
2 records match the filter expression @credit_score:{low} for the given index.
Range queries
Range Queries are useful for performing a vector search where only the results within a vector distance_threshold
are returned. This enables the user to find all records within their dataset that are similar to a query vector where "similar" is defined by a quantitative value.
from redisvl.query import RangeQuery
range_query = RangeQuery(
vector=[0.1, 0.1, 0.5],
vector_field_name="user_embedding",
return_fields=["user", "credit_score", "age", "job", "location"],
distance_threshold=0.2
)
# same as the vector query or filter query
results = index.query(range_query)
result_print(results)
vector_distance | user | credit_score | age | job |
---|
0 | john | high | 18 | engineer |
0 | derrick | low | 14 | doctor |
0.109129190445 | tyler | high | 100 | engineer |
0.158808946609 | tim | high | 12 | dermatologist |
You can also change the distance threshold of the query object between uses. Here, you will set distance_threshold==0.1
. This means that the query object will return all matches that are within 0.1 of the query object. This is a small distance, so expect to get fewer matches than before.
range_query.set_distance_threshold(0.1)
result_print(index.query(range_query))
vector_distance | user | credit_score | age | job |
---|
0 | john | high | 18 | engineer |
0 | derrick | low | 14 | doctor |
Range queries can also be used with filters like any other query type. The following limits the results to only those records with a job
of engineer
while also being within the vector range (i.e., distance).
is_engineer = Text("job") == "engineer"
range_query.set_filter(is_engineer)
result_print(index.query(range_query))
vector_distance | user | credit_score | age | job |
---|
0 | john | high | 18 | engineer |
Other Redis queries
There may be cases where RedisVL does not cover the explicit functionality required by the query, either because of new releases that haven't been implemented in the client, or because of a very specific use case. In these cases, it is possible to use the SearchIndex.search
method to execute queries with a redis-py Query
object or through a raw Redis string.
redis-py
# Manipulate the redis-py Query object
redis_py_query = v.query
# choose to sort by age instead of vector distance
redis_py_query.sort_by("age", asc=False)
# run the query with the ``SearchIndex.search`` method
result = index.search(redis_py_query, v.params)
result_print(result)
vector_distance | age | user | credit_score | job | office_location |
---|
0.109129190445 | 100 | tyler | high | engineer | -122.0839,37.3861 |
0.266666650772 | 94 | nancy | high | doctor | -122.4194,37.7749 |
0.653301358223 | 35 | joe | medium | dentist | -122.0839,37.3861 |
0 | 18 | john | high | engineer | -122.4194,37.7749 |
0.217882037163 | 15 | taimur | low | CEO | -122.0839,37.3861 |
0 | 14 | derrick | low | doctor | -122.4194,37.7749 |
0.158808946609 | 12 | tim | high | dermatologist | -122.0839,37.3861 |
Raw Redis query string
One case might be where you want to have a search that only filters on a tag field and doesn't need other functionality. Conversely, you may require a query that is more complex than what is currently supported by RedisVL. In these cases, you can use the SearchIndex.search
method with a raw Redis query string.
t = Tag("credit_score") == "high"
str(t)
'@credit_score:{high}'
results = index.search(str(t))
for r in results.docs:
print(r.__dict__)
{'id': 'user_queries_docs:0e511391dcf346639669bdba70a189c0', 'payload': None, 'user': 'john', 'age': '18', 'job': 'engineer', 'credit_score': 'high', 'office_location': '-122.4194,37.7749', 'user_embedding': '==\x00\x00\x00?'}
{'id': 'user_queries_docs:d204e8e5df90467dbff5b2fb6f800a78', 'payload': None, 'user': 'nancy', 'age': '94', 'job': 'doctor', 'credit_score': 'high', 'office_location': '-122.4194,37.7749', 'user_embedding': '333?=\x00\x00\x00?'}
{'id': 'user_queries_docs:7cf3d6b1a4044966b4f0c5d3725a5e03', 'payload': None, 'user': 'tyler', 'age': '100', 'job': 'engineer', 'credit_score': 'high', 'office_location': '-122.0839,37.3861', 'user_embedding': '=>\x00\x00\x00?'}
{'id': 'user_queries_docs:f6581edaaeaf432a85c1d1df8fdf5edc', 'payload': None, 'user': 'tim', 'age': '12', 'job': 'dermatologist', 'credit_score': 'high', 'office_location': '-122.0839,37.3861', 'user_embedding': '>>\x00\x00\x00?'}
Inspecting queries
In this example, you will learn how to inspect the query that is generated by RedisVL. This can be useful for debugging purposes or for understanding how the query is being executed.
Consider an example of a query that combines a numeric filter with a tag filter. This will search for users that are between the ages of between 18 and 100, have a high credit score, and sort the results by closest vector distance to the query vector.
t = Tag("credit_score") == "high"
low = Num("age") >= 18
high = Num("age") <= 100
combined = t & low & high
v.set_filter(combined)
# Using the str() method, you can see what Redis Query this will emit.
str(v)
'((@credit_score:{high} @age:[18 +inf]) @age:[-inf 100])=>[KNN 10 @user_embedding $vector AS vector_distance] RETURN 6 user credit_score age job office_location vector_distance SORTBY vector_distance ASC DIALECT 2 LIMIT 0 10'