HyperLogLog is a probabilistic data structure that estimates the cardinality of a set.
HyperLogLog is a probabilistic data structure that estimates the cardinality of a set. As a probabilistic data structure, HyperLogLog trades perfect accuracy for efficient space utilization.
The Redis HyperLogLog implementation uses up to 12 KB and provides a standard error of 0.81%.
Counting unique items usually requires an amount of memory
proportional to the number of items you want to count, because you need
to remember the elements you have already seen in the past in order to avoid
counting them multiple times. However, a set of algorithms exist that trade
memory for precision: they return an estimated measure with a standard error,
which, in the case of the Redis implementation for HyperLogLog, is less than 1%.
The magic of this algorithm is that you no longer need to use an amount of memory
proportional to the number of items counted, and instead can use a
constant amount of memory; 12k bytes in the worst case, or a lot less if your
HyperLogLog (We'll just call them HLL from now) has seen very few elements.
HLLs in Redis, while technically a different data structure, are encoded
as a Redis string, so you can call GET to serialize a HLL, and SET
to deserialize it back to the server.
Conceptually the HLL API is like using Sets to do the same task. You would
SADD every observed element into a set, and would use SCARD to check the
number of elements inside the set, which are unique since SADD will not
re-add an existing element.
While you don't really add items into an HLL, because the data structure
only contains a state that does not include actual elements, the API is the
same:
Every time you see a new element, you add it to the count with PFADD.
When you want to retrieve the current approximation of unique elements added using the PFADD command, you can use the PFCOUNT command. If you need to merge two different HLLs, the PFMERGE command is available. Since HLLs provide approximate counts of unique elements, the result of the merge will give you an approximation of the number of unique elements across both source HLLs.
packageexample_commands_testimport("context""fmt""github.com/redis/go-redis/v9")funcExampleClient_pfadd(){ctx:=context.Background()rdb:=redis.NewClient(&redis.Options{Addr:"localhost:6379",Password:"",// no password docs
DB:0,// use default DB
})res1,err:=rdb.PFAdd(ctx,"bikes","Hyperion","Deimos","Phoebe","Quaoar").Result()iferr!=nil{panic(err)}fmt.Println(res1)// 1
res2,err:=rdb.PFCount(ctx,"bikes").Result()iferr!=nil{panic(err)}fmt.Println(res2)// 4
res3,err:=rdb.PFAdd(ctx,"commuter_bikes","Salacia","Mimas","Quaoar").Result()iferr!=nil{panic(err)}fmt.Println(res3)// 1
res4,err:=rdb.PFMerge(ctx,"all_bikes","bikes","commuter_bikes").Result()iferr!=nil{panic(err)}fmt.Println(res4)// OK
res5,err:=rdb.PFCount(ctx,"all_bikes").Result()iferr!=nil{panic(err)}fmt.Println(res5)// 6
}
Some examples of use cases for this data structure is counting unique queries
performed by users in a search form every day, number of unique visitors to a web page and other similar cases.
Redis is also able to perform the union of HLLs, please check the
full documentation for more information.
Use cases
Anonymous unique visits of a web page (SaaS, analytics tools)
This application answers these questions:
How many unique visits has this page had on this day?
How many unique users have played this song?
How many unique users have viewed this video?
Note:
Storing the IP address or any other kind of personal identifier is against the law in some countries, which makes it impossible to get unique visitor statistics on your website.
One HyperLogLog is created per page (video/song) per period, and every IP/identifier is added to it on every visit.
Writing (PFADD) to and reading from (PFCOUNT) the HyperLogLog is done in constant time and space.
Merging HLLs is O(n), where n is the number of sketches.
Limits
The HyperLogLog can estimate the cardinality of sets with up to 18,446,744,073,709,551,616 (2^64) members.