Best practices for Redis Query Engine performance

Note:
If you're using Redis Software or Redis Cloud, see the best practices for scalable Redis Query Engine page.

Checklist

Below are some basic steps to ensure good performance of the Redis Query Engine (RQE).

  • Create a Redis data model with your query patterns in mind.
  • Ensure the Redis architecture has been sized for the expected load using the sizing calculator.
  • Provision Redis nodes with sufficient resources (RAM, CPU, network) to support the expected maximum load.
  • Review FT.INFO and FT.PROFILE outputs for anomalies and/or errors.
  • Conduct load testing in a test environment with real-world queries and a load generated by either memtier_benchmark or a custom load application.

Indexing considerations

General

  • Favor TAG over NUMERIC for use cases that only require matching.
  • Favor TAG over TEXT for use cases that don’t require full-text capabilities (pure match).
  • Put only those fields used in your queries in the index.
  • Only make fields SORTABLE if they are used in SORTBY queries.
  • Use DIALECT 4.
  • Put both query fields and any projected fields (RETURN or LOAD) in the index.
  • Set all fields to SORTABLE.
  • Set TAG fields to UNF.
  • Optional: Set TEXT fields to NOSTEM if the use case will support it.
  • Use DIALECT 4.

Query optimization

  • Avoid returning large result sets. Use CURSOR or LIMIT.
  • Avoid wildcard searches.
  • Avoid projecting all fields (e.g., LOAD *). Project only those fields that are part of the index schema.
  • If queries are long-running, enable threading (query performance factor) to reduce contention for the main Redis thread.

Validate performance (FT.PROFILE)

You can analyze FT.PROFILE output to gain insights about query execution. The following informational items are available for analysis:

  • Total execution time
  • Execution time per shard
  • Coordination time (for multi-sharded environments)
  • Breakdown of the query into fundamental components, such as UNION and INTERSECT
  • Warnings, such as TIMEOUT

Anti-patterns

When designing and querying indexes in RQE, certain practices can hinder performance, scalability, and maintainability. Below are some common anti-patterns to avoid:

  • Large documents: storing excessively large documents in Redis makes data retrieval slower and increases memory usage. Break data into smaller, focused records whenever possible.
  • Deeply-nested fields: retrieving or indexing deeply-nested JSON fields is computationally expensive. Use a flatter schema for better performance.
  • Large result sets: fetching unnecessarily large result sets puts a strain on memory and network resources. Limit results to only what is needed.
  • Wildcarding: using wildcard patterns indiscriminately in queries can lead to large and inefficient scans, especially if the index size is significant.
  • Large projections: including excessive fields in query results increases memory overhead and slows down query execution. Limit projections to essential fields.

The following examples depict an anti-pattern index schema and query, followed by corrected versions designed for scalability with RQE.

Anti-pattern index schema

The following schema introduces challenges for scalability and performance:

FT.CREATE jsonidx:profiles ON JSON PREFIX 1 profiles: 
          SCHEMA $.tags.* as t NUMERIC SORTABLE 
                 $.firstName as name TEXT 
                 $.location as loc GEO

Issues:

  • Minimal schema definition: the schema is sparse and lacks fields like lastName, id, and version that might be frequently queried. This results in additional operations to fetch these fields separately, reducing efficiency.
  • Missing SORTABLE flag for text fields: sorting operations on unsortable fields require full-text processing, which is slow.
  • Wildcard indexing: $.tags.* creates a broad index that can lead to excessive memory usage and reduced query performance.

Anti-pattern query

The following query is inefficient and not optimized for vertical scaling:

FT.AGGREGATE jsonidx:profiles '@t:[1299 1299]' LOAD * LIMIT 0 10

Issues:

  • Wildcard projection (LOAD *): retrieving all fields in the result set is inefficient and increases memory usage, especially if the documents are large.
  • Unnecessary fields: fields that aren't required for the current operation are still fetched, slowing down execution.
  • Lack of advanced query syntax: without specifying a query dialect or leveraging features like tagging, the query may perform unnecessary computations.

Improved index schema

Here’s an optimized schema that adheres to best practices for vertical scaling:

FT.CREATE jsonidx:profiles ON JSON PREFIX 1 profiles: 
          SCHEMA $.tags.* as t NUMERIC SORTABLE 
                 $.firstName as name TEXT NOSTEM SORTABLE 
                 $.lastName as lastname TEXT NOSTEM SORTABLE 
                 $.location as loc GEO SORTABLE 
                 $.id as id TAG SORTABLE UNF 
                 $.ver as ver TAG SORTABLE UNF

Improvements:

  • NOSTEM for text fields: prevents stemming on fields like firstName and lastName to allow for exact matches (e.g., "Smith" stays "Smith").
  • Expanded schema: adds commonly queried fields like lastName, id, and version, making queries more efficient by reducing the need for post-query data retrieval.
  • TAG fields: id and ver are defined as TAG fields to support fast filtering with exact matches.
  • SORTABLE for all relevant fields: ensures that sorting operations are efficient without requiring full-text scanning.

You might be wondering why $.tags.* as t NUMERIC SORTABLE is acceptable in the improved schema and it wasn't previously. The inclusion of $.tags.* is acceptable when:

  • It has a clear purpose: it is actively used in queries, such as filtering on numeric ranges or matching specific values.
  • Other fields in the schema complement it: these fields reduce over-reliance on $.tags.* for all query operations, distributing the load more evenly.
  • Projections and limits are managed carefully: queries that use $.tags.* should avoid loading unnecessary fields or returning excessively large result sets.

Improved query

The following query is better suited for vertical scaling:

FT.AGGREGATE jsonidx:profiles '@t:[1299 1299]' 
                LOAD 6 id t name lastname loc ver 
                LIMIT 0 10
                DIALECT 3

Improvements:

  • Targeted projection: the LOAD clause specifies only essential fields (id, t, name, lastname, loc, ver), reducing memory and network overhead.
  • Limited results: the LIMIT clause ensures the query retrieves only the first 10 results, avoiding large result sets.
  • DIALECT 3: enables the latest RQE syntax and features, ensuring compatibility with modern capabilities.
RATE THIS PAGE
Back to top ↑