Probabilistic data types

Learn how to use HyperLogLog approximate cardinality with redis-rs.

Redis supports several probabilistic data types that let you calculate values approximately rather than exactly. The redis-rs high-level command traits include support for HyperLogLog cardinality estimation.

Note:
This page covers HyperLogLog because redis-rs provides high-level pfadd, pfcount, and pfmerge methods. Other probabilistic data types, such as Bloom filters, Count-min sketch, t-digest, and Top-K, can still be called with low-level Redis commands, but they don't currently have dedicated high-level redis-rs methods.

Set cardinality

A HyperLogLog object calculates the approximate cardinality of a set. As you add items, the HyperLogLog tracks the number of distinct set members, but it doesn't let you retrieve those members or test whether a specific item was added.

You can also merge two or more HyperLogLogs to find the approximate cardinality of the union of the sets they represent.

Set cardinality: Estimate distinct item count using HyperLogLog with minimal memory usage
mod home_prob_dts_tests {
    use redis::Commands;

    fn run() {
        let mut r = match redis::Client::open("redis://127.0.0.1") {
            Ok(client) => match client.get_connection() {
                Ok(conn) => conn,
                Err(e) => {
                    println!("Failed to connect to Redis: {e}");
                    return;
                }
            },
            Err(e) => {
                println!("Failed to create Redis client: {e}");
                return;
            }
        };

        let group1_added: bool = r
            .pfadd("group:1", &["andy", "cameron", "david"])
            .expect("Failed to add items to group:1");
        println!("{group1_added}"); // >>> true

        let group1: usize = r.pfcount("group:1").expect("Failed to count group:1");
        println!("{group1}"); // >>> 3

        let group2_added: bool = r
            .pfadd("group:2", &["kaitlyn", "michelle", "paolo", "rachel"])
            .expect("Failed to add items to group:2");
        println!("{group2_added}"); // >>> true

        let group2: usize = r.pfcount("group:2").expect("Failed to count group:2");
        println!("{group2}"); // >>> 4

        let _: () = r
            .pfmerge("both_groups", &["group:1", "group:2"])
            .expect("Failed to merge HyperLogLogs");
        println!("OK"); // >>> OK

        let both_groups: usize = r
            .pfcount("both_groups")
            .expect("Failed to count both_groups");
        println!("{both_groups}"); // >>> 7

    }
}
mod home_prob_dts_tests {
    use redis::AsyncCommands;

    async fn run() {
        let mut r = match redis::Client::open("redis://127.0.0.1") {
            Ok(client) => match client.get_multiplexed_async_connection().await {
                Ok(conn) => conn,
                Err(e) => {
                    println!("Failed to connect to Redis: {e}");
                    return;
                }
            },
            Err(e) => {
                println!("Failed to create Redis client: {e}");
                return;
            }
        };

        let group1_added: bool = r
            .pfadd("group:1", &["andy", "cameron", "david"])
            .await
            .expect("Failed to add items to group:1");
        println!("{group1_added}"); // >>> true

        let group1: usize = r.pfcount("group:1").await.expect("Failed to count group:1");
        println!("{group1}"); // >>> 3

        let group2_added: bool = r
            .pfadd("group:2", &["kaitlyn", "michelle", "paolo", "rachel"])
            .await
            .expect("Failed to add items to group:2");
        println!("{group2_added}"); // >>> true

        let group2: usize = r.pfcount("group:2").await.expect("Failed to count group:2");
        println!("{group2}"); // >>> 4

        let _: () = r
            .pfmerge("both_groups", &["group:1", "group:2"])
            .await
            .expect("Failed to merge HyperLogLogs");
        println!("OK"); // >>> OK

        let both_groups: usize = r
            .pfcount("both_groups")
            .await
            .expect("Failed to count both_groups");
        println!("{both_groups}"); // >>> 7

    }
}

The main benefit that HyperLogLogs offer is their very low memory usage. They can count up to 2^64 items with less than 1% standard error using a maximum 12KB of memory.

More information

See the following pages to learn more:

RATE THIS PAGE
Back to top ↑