Best practices for Redis Query Engine performance
Checklist
Below are some basic steps to ensure good performance of the Redis Query Engine (RQE).
- Create a Redis data model with your query patterns in mind.
- Ensure the Redis architecture has been sized for the expected load using the sizing calculator.
- Provision Redis nodes with sufficient resources (RAM, CPU, network) to support the expected maximum load.
- Review
FT.INFO
andFT.PROFILE
outputs for anomalies and/or errors. - Conduct load testing in a test environment with real-world queries and a load generated by either memtier_benchmark or a custom load application.
Indexing considerations
General
- Favor
TAG
overNUMERIC
for use cases that only require matching. - Favor
TAG
overTEXT
for use cases that don’t require full-text capabilities (pure match).
Non-threaded search
- Put only those fields used in your queries in the index.
- Only make fields
SORTABLE
if they are used inSORTBY
queries. - Use
DIALECT 4
.
Threaded (query performance factor or QPF) search
- Put both query fields and any projected fields (
RETURN
orLOAD
) in the index. - Set all fields to
SORTABLE
. - Set TAG fields to UNF.
- Optional: Set
TEXT
fields toNOSTEM
if the use case will support it. - Use
DIALECT 4
.
Query optimization
- Avoid returning large result sets. Use
CURSOR
orLIMIT
. - Avoid wildcard searches.
- Avoid projecting all fields (e.g.,
LOAD *
). Project only those fields that are part of the index schema. - If queries are long-running, enable threading (query performance factor) to reduce contention for the main Redis thread.
Validate performance (FT.PROFILE
)
You can analyze FT.PROFILE
output to gain insights about query execution.
The following informational items are available for analysis:
- Total execution time
- Execution time per shard
- Coordination time (for multi-sharded environments)
- Breakdown of the query into fundamental components, such as
UNION
andINTERSECT
- Warnings, such as
TIMEOUT
Anti-patterns
When designing and querying indexes in RQE, certain practices can hinder performance, scalability, and maintainability. Below are some common anti-patterns to avoid:
- Large documents: storing excessively large documents in Redis makes data retrieval slower and increases memory usage. Break data into smaller, focused records whenever possible.
- Deeply-nested fields: retrieving or indexing deeply-nested JSON fields is computationally expensive. Use a flatter schema for better performance.
- Large result sets: fetching unnecessarily large result sets puts a strain on memory and network resources. Limit results to only what is needed.
- Wildcarding: using wildcard patterns indiscriminately in queries can lead to large and inefficient scans, especially if the index size is significant.
- Large projections: including excessive fields in query results increases memory overhead and slows down query execution. Limit projections to essential fields.
The following examples depict an anti-pattern index schema and query, followed by corrected versions designed for scalability with RQE.
Anti-pattern index schema
The following schema introduces challenges for scalability and performance:
FT.CREATE jsonidx:profiles ON JSON PREFIX 1 profiles:
SCHEMA $.tags.* as t NUMERIC SORTABLE
$.firstName as name TEXT
$.location as loc GEO
Issues:
- Minimal schema definition: the schema is sparse and lacks fields like
lastName
,id
, andversion
that might be frequently queried. This results in additional operations to fetch these fields separately, reducing efficiency. - Missing
SORTABLE
flag for text fields: sorting operations on unsortable fields require full-text processing, which is slow. - Wildcard indexing:
$.tags.*
creates a broad index that can lead to excessive memory usage and reduced query performance.
Anti-pattern query
The following query is inefficient and not optimized for vertical scaling:
FT.AGGREGATE jsonidx:profiles '@t:[1299 1299]' LOAD * LIMIT 0 10
Issues:
- Wildcard projection (
LOAD *
): retrieving all fields in the result set is inefficient and increases memory usage, especially if the documents are large. - Unnecessary fields: fields that aren't required for the current operation are still fetched, slowing down execution.
- Lack of advanced query syntax: without specifying a query dialect or leveraging features like tagging, the query may perform unnecessary computations.
Improved index schema
Here’s an optimized schema that adheres to best practices for vertical scaling:
FT.CREATE jsonidx:profiles ON JSON PREFIX 1 profiles:
SCHEMA $.tags.* as t NUMERIC SORTABLE
$.firstName as name TEXT NOSTEM SORTABLE
$.lastName as lastname TEXT NOSTEM SORTABLE
$.location as loc GEO SORTABLE
$.id as id TAG SORTABLE UNF
$.ver as ver TAG SORTABLE UNF
Improvements:
NOSTEM
for text fields: prevents stemming on fields likefirstName
andlastName
to allow for exact matches (e.g., "Smith" stays "Smith").- Expanded schema: adds commonly queried fields like
lastName
,id
, andversion
, making queries more efficient by reducing the need for post-query data retrieval. TAG
fields:id
andver
are defined asTAG
fields to support fast filtering with exact matches.SORTABLE
for all relevant fields: ensures that sorting operations are efficient without requiring full-text scanning.
You might be wondering why $.tags.* as t NUMERIC SORTABLE
is acceptable in the improved schema and it wasn't previously.
The inclusion of $.tags.*
is acceptable when:
- It has a clear purpose: it is actively used in queries, such as filtering on numeric ranges or matching specific values.
- Other fields in the schema complement it: these fields reduce over-reliance on
$.tags.*
for all query operations, distributing the load more evenly. - Projections and limits are managed carefully: queries that use
$.tags.*
should avoid loading unnecessary fields or returning excessively large result sets.
Improved query
The following query is better suited for vertical scaling:
FT.AGGREGATE jsonidx:profiles '@t:[1299 1299]'
LOAD 6 id t name lastname loc ver
LIMIT 0 10
DIALECT 3
Improvements:
- Targeted projection: the
LOAD
clause specifies only essential fields (id, t, name, lastname, loc, ver
), reducing memory and network overhead. - Limited results: the
LIMIT
clause ensures the query retrieves only the first 10 results, avoiding large result sets. DIALECT 3
: enables the latest RQE syntax and features, ensuring compatibility with modern capabilities.