How to improve the performance of searches over large datasets?

Last updated 22, Mar 2024

Question

Answer

When using FT.SEARCH with sorting and paging (LIMIT <offset> <count>), a new search query is executed every time, and results are gathered and then truncated to the requested size. This may not be convenient over large datasets.

When using FT.AGGREGATE with a similar query and using the cursor API, the query results are being constructed once, on the initial query, and a Cursor ID is being returned that can be queried later by using FT.CURSOR READ <index> <cursor-id>. Subsequent calls will return more and more results from the query until the cursor is exhausted. This will help will larger datasets.

Make sure LOAD is not used with FT.AGGREGATE commands, as it hurts the performance of aggregated queries.

Using FT.SEARCH

FT.CREATE po ON JSON PREFIX 1 "original:" SCHEMA "$.datetime" as datetime NUMERIC SORTABLE
FT.SEARCH po "@datetime:[1619290184.788929 1682362184.788929]" WITHSCORES SORTBY datetime DESC LIMIT 0 50
Copy code

subsequent queries:

FT.SEARCH po "@datetime:[1619290184.788929 1682362184.788929]" WITHSCORES SORTBY datetime DESC LIMIT 50 50
Copy code

Using FT.AGGREGATE

FT.CREATE pp ON JSON PREFIX 1 "proposed:" SCHEMA "$.datetime" as datetime NUMERIC SORTABLE
FT.AGGREGATE pp "@datetime:[1619290321.959789 1682362321.959789]" WITHCURSOR COUNT 50 LOAD * SORTBY 2 @datetime DESC MAX 1000
Copy code

subsequent reads are performed with the cursor-id, which is returned from the initial query and can be used to fetch additional result sets from the same query.

FT.CURSOR READ pp <cursor-id> READ 50
Copy code

Test both approaches when dealing with large datasets and evaluate the average latency, CPU usage, and throughput.

References

Refer to the documentation to learn more about aggregations

Products

Tools

Get Redis

Connect

Learn

Latest

See how it works

How to improve the performance of searches over large datasets?

Question

Answer

Using FT.SEARCH

Using FT.AGGREGATE

References