FT.AGGREGATE

Syntax text Syntax diagram API methods

FT.AGGREGATE index query 
  [VERBATIM] 
  [LOAD count field [field ...]] 
  [TIMEOUT timeout] 
  [GROUPBY nargs property [property ...] [REDUCE function nargs arg [arg ...] [AS name] [REDUCE function nargs arg [arg ...] [AS name] ...]] ...]] 
  [SORTBY nargs [property ASC | DESC [property ASC | DESC ...]] [MAX num] [WITHCOUNT | WITHOUTCOUNT]] 
  [APPLY expression AS name [APPLY expression AS name ...]] 
  [LIMIT offset num] 
  [FILTER filter] 
  [WITHCURSOR [COUNT read_size] [MAXIDLE idle_time]] 
  [PARAMS nargs name value [name value ...]] 
  [SCORER scorer]
  [ADDSCORES] 
  [DIALECT dialect]

Client:

aggregate(
    query: Union[str, Query],  // The aggregation query
    query_params: Optional[Dict[str, Union[str, int, float]]]  // Optional query parameters
) → AggregateResult  // The aggregation results

AGGREGATE(
    index: RedisArgument,  // The index name
    query: RedisArgument,  // The aggregation query
    options: FTAggregateOptions  // Optional aggregation options
) → Promise<AggregateResult>  // The aggregation results

ftAggregate(
    indexName: String,  // The index name
    aggr: AggregationBuilder  // The aggregation builder
) → AggregationResult  // The aggregation results

FTAggregate(
    ctx: context.Context,
    index: string,  // The index name
    query: string  // The aggregation query
) → *MapStringInterfaceCmd  // The aggregation results

FTAggregateWithArgs(
    ctx: context.Context,
    index: string,  // The index name
    query: string,  // The aggregation query
    options: *FTAggregateOptions  // Aggregation options
) → *AggregateCmd  // The aggregation results

Aggregate(
    index: string,  // The index name
    query: AggregationRequest  // The aggregation query
) → AggregationResult  // The aggregation results

AggregateAsync(
    index: string,  // The index name
    query: AggregationRequest  // The aggregation query
) → Task<AggregationResult>  // The aggregation results

ftaggregate(
    $index: string,  // The index name
    $query: string,  // The aggregation query
    $arguments: ?AggregateArguments  // Optional aggregation arguments
) → array  // The aggregation results

Available in:: Redis Open Source / Search 1.1.0
Time complexity:: O(1)
ACL categories:: @search, @read, @fast,
Compatibility:: Redis Software and Redis Cloud compatibility

Run a search query on an index and perform aggregate transformations on the results.

Note:

This command will only return keys to which the user has read access.

Examples

Required arguments

index

is index name against which the query is executed. You must first create the index using FT.CREATE.

query

is base filtering query that retrieves the documents. It follows the exact same syntax as the search query, including filters, unions, not, optional, and so on.

Optional arguments

VERBATIM

if set, does not try to use stemming for query expansion but searches the query terms verbatim.

LOAD {nargs} {identifier} AS {property} …

loads document attributes from the source document.

identifier is either an attribute name for hashes and JSON or a JSON Path expression for JSON.
property is the optional name used in the result. If it is not provided, the identifier is used. This should be avoided.
If * is used as nargs, all attributes in a document are loaded.

Attributes needed for aggregations should be stored as SORTABLE, where they are available to the aggregation pipeline with very low latency. LOAD hurts the performance of aggregate queries considerably because every processed record needs to execute the equivalent of HMGET against a Redis key, which when executed over millions of keys, amounts to high processing times.

GROUPBY {nargs} {property}

groups the results in the pipeline based on one or more properties. Each group should have at least one reducer, a function that handles the group entries, either counting them, or performing multiple aggregate operations (see below).

REDUCE {func} {nargs} {arg} … [AS {name}]

reduces the matching results in each group into a single record, using a reduction function. For example, COUNT counts the number of records in the group. The reducers can have their own property names using the AS {name} optional argument. If a name is not given, the resulting name will be the name of the reduce function and the group properties. For example, if a name is not given to COUNT_DISTINCT by property @foo, the resulting name will be count_distinct(@foo).

See Supported GROUPBY reducers for more details.

SORTBY {nargs} {property} {ASC|DESC} [MAX {num}]

sorts the pipeline up until the point of SORTBY, using a list of properties.

By default, sorting is ascending, but ASC or DESC can be added for each property.
nargs is the number of sorting parameters, including ASC and DESC, for example, SORTBY 4 @foo ASC @bar DESC.
MAX is used to optimized sorting, by sorting only for the n-largest elements. Although it is not connected to LIMIT, you usually need just SORTBY … MAX for common queries.

Attributes needed for SORTBY should be stored as SORTABLE to be available with very low latency.

Sorting Optimizations: performance is optimized for sorting operations on DIALECT 4 in different scenarios:

Skip Sorter - applied when there is no sort of any kind. The query can return after it reaches the LIMIT requested results.
Partial Range - applied when there is a SORTBY clause over a numeric field, with no filter or filter by the same numeric field, the query iterate on a range large enough to satisfy the LIMIT requested results.
Hybrid - applied when there is a SORTBY clause over a numeric field and another non-numeric filter. Some results will get filtered, and the initial range may not be large enough. The iterator is then rewinding with the following ranges, and an additional iteration takes place to collect the LIMIT requested results.
No optimization - If there is a sort by score or by non-numeric field, there is no other option but to retrieve all results and compare their values.

Counts behavior: optional WITHCOUNT argument returns accurate counts for the query results with sorting. This operation processes all results in order to get an accurate count, being less performant than the optimized option (default behavior on DIALECT 4)

You can also use WITHOUTCOUNT in place of DIALECT 4 when used with either FT.SEARCH or FT.AGGREGATE.

APPLY {expr} AS {name}

applies a 1-to-1 transformation on one or more properties and either stores the result as a new property down the pipeline or replaces any property using this transformation.

expr is an expression that can be used to perform arithmetic operations on numeric properties, or functions that can be applied on properties depending on their types (see below), or any combination thereof. For example, APPLY "sqrt(@foo)/log(@bar) + 5" AS baz evaluates this expression dynamically for each record in the pipeline and store the result as a new property called baz, which can be referenced by further APPLY/SORTBY/GROUPBY/REDUCE operations down the pipeline.

See APPLY expressions for details.

LIMIT {offset} {num}

limits the number of results to return just num results starting at index offset (zero-based). It is much more efficient to use SORTBY … MAX if you are interested in just limiting the output of a sort operation. If a key expires during the query, an attempt to load the key's value will return a null array.

However, limit can be used to limit results without sorting, or for paging the n-largest results as determined by SORTBY MAX. For example, getting results 50-100 of the top 100 results is most efficiently expressed as SORTBY 1 @foo MAX 100 LIMIT 50 50. Removing the MAX from SORTBY results in the pipeline sorting all the records and then paging over results 50-100.

FILTER {expr}

filters the results using predicate expressions relating to values in each result. They are applied post query and relate to the current state of the pipeline.

WITHCURSOR {COUNT} {read_size} [MAXIDLE {idle_time}]

Scan part of the results with a quicker alternative than LIMIT. See Cursor API for more details.

TIMEOUT {milliseconds}

if set, overrides the timeout parameter of the module.

PARAMS {nargs} {name} {value}

defines one or more value parameters. Each parameter has a name and a value.

You can reference parameters in the query by a $, followed by the parameter name, for example, $user. Each such reference in the search query to a parameter name is substituted by the corresponding parameter value. For example, with parameter definition PARAMS 4 lon 29.69465 lat 34.95126, the expression @loc:[$lon $lat 10 km] is evaluated to @loc:[29.69465 34.95126 10 km]. You cannot reference parameters in the query string where concrete values are not allowed, such as in field names, for example, @loc. To use PARAMS, set DIALECT to 2 or greater than 2.

SCORER {scorer}

uses a built-in or a user-provided scoring function.

ADDSCORES

The ADDSCORES option exposes the full-text score values to the aggregation pipeline. You can use @__score in a pipeline as shown in the following example:

FT.AGGREGATE idx 'hello' ADDSCORES SORTBY 2 @__score DESC

DIALECT {dialect_version}

selects the dialect version under which to execute the query. If not specified, the query will execute under the default dialect version set during module initial loading or via FT.CONFIG SET command.

Return multiple values

See Return multiple values in FT.SEARCH The DIALECT can be specified as a parameter in the FT.AGGREGATE command. If it is not specified, the DEFAULT_DIALECT is used, which can be set using FT.CONFIG SET or by passing it as an argument to the redisearch module when it is loaded. For example, with the following document and index:

127.0.0.1:6379> JSON.SET doc:1 $ '[{"arr": [1, 2, 3]}, {"val": "hello"}, {"val": "world"}]'
OK
127.0.0.1:6379> FT.CREATE idx ON JSON PREFIX 1 doc: SCHEMA $..arr AS arr NUMERIC $..val AS val TEXT
OK

Notice the different replies, with and without DIALECT 3:

127.0.0.1:6379> FT.AGGREGATE idx * LOAD 2 arr val 
1) (integer) 1
2) 1) "arr"
   2) "[1,2,3]"
   3) "val"
   4) "hello"

127.0.0.1:6379> FT.AGGREGATE idx * LOAD 2 arr val DIALECT 3
1) (integer) 1
2) 1) "arr"
   2) "[[1,2,3]]"
   3) "val"
   4) "[\"hello\",\"world\"]"

Complexity

Non-deterministic. Depends on the query and aggregations performed, but it is usually linear to the number of results returned.

Examples

Sort page visits by day

Find visits to the page about.html, group them by the day of the visit, count the number of visits, and sort them by day.

FT.AGGREGATE idx "@url:\"about.html\""
    APPLY "day(@timestamp)" AS day
    GROUPBY 2 @day @country
      REDUCE count 0 AS num_visits
    SORTBY 1 @day

Find most books ever published

Find most books ever published in a single year.

FT.AGGREGATE books-idx *
    GROUPBY 1 @published_year
      REDUCE COUNT 0 AS num_published
    GROUPBY 0
      REDUCE MAX 1 @num_published AS max_books_published_per_year

Reduce all results

The last example used GROUPBY 0. Use GROUPBY 0 to apply a REDUCE function over all results from the last step of an aggregation pipeline -- this works on both the initial query and subsequent GROUPBY operations.

Search for libraries within 10 kilometers of the longitude -73.982254 and latitude 40.753181 then annotate them with the distance between their location and those coordinates.

 FT.AGGREGATE libraries-idx "@location:[-73.982254 40.753181 10 km]"
    LOAD 1 @location
    APPLY "geodistance(@location, -73.982254, 40.753181)"

Here, notice the required use of LOAD to pre-load the @location attribute because it is a GEO attribute.

Next, count GitHub events by user (actor), to produce the most active users.

127.0.0.1:6379> FT.AGGREGATE gh "*" GROUPBY 1 @actor REDUCE COUNT 0 AS num SORTBY 2 @num DESC MAX 10
 1) (integer) 284784
 2) 1) "actor"
    2) "lombiqbot"
    3) "num"
    4) "22197"
 3) 1) "actor"
    2) "codepipeline-test"
    3) "num"
    4) "17746"
 4) 1) "actor"
    2) "direwolf-github"
    3) "num"
    4) "10683"
 5) 1) "actor"
    2) "ogate"
    3) "num"
    4) "6449"
 6) 1) "actor"
    2) "openlocalizationtest"
    3) "num"
    4) "4759"
 7) 1) "actor"
    2) "digimatic"
    3) "num"
    4) "3809"
 8) 1) "actor"
    2) "gugod"
    3) "num"
    4) "3512"
 9) 1) "actor"
    2) "xdzou"
    3) "num"
    4) "3216"
[10](10)) 1) "actor"
    2) "opstest"
    3) "num"
    4) "2863"
11) 1) "actor"
    2) "jikker"
    3) "num"
    4) "2794"
(0.59s)

Use the case function for conditional logic

//Simple mapping
FT.AGGREGATE products "*"
APPLY case(@price > 100, "premium", "standard") AS category

//Nested conditions where an error should be returned
FT.AGGREGATE orders "*"
APPLY case(@status == "pending", 
           case(@priority == "high", 1, 2), 
           case(@status == "completed", 3, 4)) AS status_code

//Mapped approach
FT.AGGREGATE orders "*"
APPLY case(@status == "pending", 1, 0) AS is_pending
APPLY case(@is_pending == 1 && @priority == "high", 1,2) AS status_high
APPLY case(@is_pending == 0 && @priority == "high", 3,4) AS status_completed

Collect top documents per group

Group movies by genre and collect the top 5 movies per group, sorted by rating in descending order. Return the top 50 groups.

FT.AGGREGATE idx:movies "*"
    LOAD 4 @genre @title @rating @year
    GROUPBY 1 @genre
      REDUCE COLLECT 12 FIELDS 3 @title @rating @year SORTBY 2 @rating DESC LIMIT 0 5 AS top_movies
    SORTBY 1 @genre LIMIT 0 50

In this example:

LOAD brings @genre, @title, @rating, and @year into the pipeline. COLLECT projects from fields already available in the pipeline; it does not trigger an implicit document load. Using FIELDS * in place of the explicit field list would return only the fields the pipeline already has, as controlled by LOAD and any field-generating operations such as APPLY.
FIELDS 3 @title @rating @year projects those three fields into each collected entry.
Duplicates are retained (default behavior, no DISTINCT flag).
Documents within each group are sorted by @rating descending.
Only the top 5 documents per group are returned.
Only the top 50 groups are returned.

Redis Software and Redis Cloud compatibility

Redis Software	Redis Cloud Flexible & Annual	Redis Cloud Free & Fixed	Notes
✅ Supported	✅ Supported	✅ Supported

Return information

RESP2 RESP3

One of the following:

Array reply where each row is an array reply representing a single aggregate result. The integer reply at position 1 does not represent a valid value.
Simple error reply in these cases: incorrect number of arguments, non-existent index, invalid query syntax.

FT.AGGREGATE

Required arguments

Optional arguments

Return multiple values

Complexity

Examples

Redis Software and Redis Cloud compatibility

Return information

See also

Related topics