Count-min sketch

Count-min sketch is a probabilistic data structure that estimates the frequency of an element in a data stream.

Count-Min Sketch is a probabilistic data structure in Redis Open Source that can be used to estimate the frequency of events/elements in a stream of data.

It uses a sub-linear space at the expense of over-counting some events due to collisions. It consumes a stream of events/elements and keeps estimated counters of their frequency.

It is very important to know that the results coming from a Count-Min sketch lower than a certain threshold (determined by the error_rate) should be ignored and often even approximated to zero. So Count-Min sketch is indeed a data-structure for counting frequencies of elements in a stream, but it's only useful for higher counts. Very low counts should be ignored as noise.

Use cases

Products (retail, online shops)

This application answers this question: What was the sales volume (on a certain day) for a product?

Use one Count-Min sketch created per day (period). Every product sale goes into the CMS. The CMS give reasonably accurate results for the products that contribute the most toward the sales. Products with low percentage of the total sales are ignored.

Examples

Assume you select an error rate of 0.1% (0.001) with a certainty of 99.8% (0.998). This means you have an error probability of 0.2% (0.002). Your sketch strives to keep the error within 0.1% of the total count of all elements you've added. There's a 0.2% chance the error might exceed this—like when an element below the threshold overlaps with one above it. When you add a few items to the CMS and evaluate their frequency, remember that in such a small sample, collisions are rare, as seen with other probabilistic data structures.

Language:

Count-min sketch operations: Use CMS.INITBYPROB to create a sketch, CMS.INCRBY to increment element counts, CMS.QUERY to estimate frequencies, and CMS.INFO to inspect sketch properties when you need to estimate element frequencies in a data stream

> CMS.INITBYPROB bikes:profit 0.001 0.002
OK
> CMS.INCRBY bikes:profit "Smokey Mountain Striker" 100
(integer) 100
> CMS.INCRBY bikes:profit "Rocky Mountain Racer" 200 "Cloudy City Cruiser" 150
1) (integer) 200
2) (integer) 150
> CMS.QUERY bikes:profit "Smokey Mountain Striker" "Rocky Mountain Racer" "Cloudy City Cruiser" "Terrible Bike Name"
1) (integer) 100
2) (integer) 200
3) (integer) 150
4) (integer) 0
> CMS.INFO bikes:profit
1) width
2) (integer) 2000
3) depth
4) (integer) 9
5) count
6) (integer) 450

CMS.INITBYPROB
Initializes a Count-Min Sketch to accommodate requested tolerances.
CMS.INCRBY
Increases the count of one or more items by increment
CMS.QUERY
Returns the count for one or more items in a sketch
CMS.INFO
Returns information about a sketch

Redis CLI guide

Also, check out our other client tools Redis Insight and Redis for VS Code.

"""
Code samples for Count-min sketch doc pages:
    https://redis.io/docs/latest/develop/data-types/probabilistic/count-min-sketch/
"""
import redis

r = redis.Redis(decode_responses=True)

res1 = r.cms().initbyprob("bikes:profit", 0.001, 0.002)
print(res1)  # >>> True

res2 = r.cms().incrby("bikes:profit", ["Smoky Mountain Striker"], [100])
print(res2)  # >>> [100]

res3 = r.cms().incrby(
    "bikes:profit", ["Rocky Mountain Racer", "Cloudy City Cruiser"], [200, 150]
)
print(res3)  # >>> [200, 150]

res4 = r.cms().query("bikes:profit", "Smoky Mountain Striker")
print(res4)  # >>> [100]

res5 = r.cms().info("bikes:profit")
print(res5.width, res5.depth, res5.count)  # >>> 2000 9 450

CMS.INITBYPROB
Initializes a Count-Min Sketch to accommodate requested tolerances.
- initbyprob(
  
  key: str,
  
  error: float, // Estimate size of error as percent of total
  
  probability: float // Desired probability for inflated count
  
  ) → bool // True if created successfully
CMS.INCRBY
Increases the count of one or more items by increment
- incrby(
  
  key: str,
  
  items: List, // Items to increment
  
  increments: List // Increment values for each item
  
  ) → List[int] // Count of each item after increment
CMS.QUERY
Returns the count for one or more items in a sketch
- query(
  
  key: str,
  
  items: *args
  
  ) → List[int] // Estimated count for each item
CMS.INFO
Returns information about a sketch
- info(
  
  key: str
  
  ) → dict // Width, depth, and total count

Python Quick-Start

import assert from 'assert';
import { createClient } from 'redis';

const client = createClient();
await client.connect();


const res1 = await client.cms.initByProb('bikes:profit', 0.001, 0.002);
console.log(res1);  // >>> OK

const res2 = await client.cms.incrBy('bikes:profit', {
    item: 'Smoky Mountain Striker',
    incrementBy: 100
});
console.log(res2);  // >>> [100]

const res3 = await client.cms.incrBy('bikes:profit', [
  {
    item: 'Rocky Mountain Racer',
    incrementBy: 200
  },
  {
    item: 'Cloudy City Cruiser',
    incrementBy: 150
  }
]);
console.log(res3);  // >>> [200, 150]

const res4 = await client.cms.query('bikes:profit', 'Smoky Mountain Striker');
console.log(res4);  // >>> [100]

const res5 = await client.cms.info('bikes:profit');
console.log(res5.width, res5.depth, res5.count);  // >>> 2000 9 450

CMS.INITBYPROB
Initializes a Count-Min Sketch to accommodate requested tolerances.
- CMS.INITBYPROB(
  
  key: RedisArgument,
  
  error: number, // Error rate as decimal between 0 and 1
  
  probability: number // Probability for inflated count
  
  ) → SimpleStringReply<'OK'> // OK on success
CMS.INCRBY
Increases the count of one or more items by increment
- CMS.INCRBY(
  
  key: RedisArgument,
  
  items: BfIncrByItem | Array<BfIncrByItem> // { item, incrementBy }
  
  ) → ArrayReply<NumberReply> // Counts after increment
CMS.QUERY
Returns the count for one or more items in a sketch
- CMS.QUERY(
  
  key: RedisArgument,
  
  items: RedisVariadicArgument
  
  ) → ArrayReply<NumberReply> // Counts for each item
CMS.INFO
Returns information about a sketch
- CMS.INFO(
  
  key: RedisArgument
  
  ) → CmsInfoReply // { width, depth, count }

Node.js Quick-Start

package io.redis.examples;


public class CMSExample {

  public void run() {
    RedisClient jedis = RedisClient.create("redis://localhost:6379");


    String res1 = jedis.cmsInitByProb("bikes:profit", 0.001d, 0.002d);
    System.out.println(res1); // >>> OK

    long res2 = jedis.cmsIncrBy("bikes:profit", "Smoky Mountain Striker", 100L);
    System.out.println(res2); // >>> 100

    List<Long> res3 = jedis.cmsIncrBy("bikes:profit", new HashMap<String, Long>() {{
      put("Rocky Mountain Racer", 200L);
      put("Cloudy City Cruiser", 150L);
    }});
    System.out.println(res3); // >>> [200, 150]

    List<Long> res4 = jedis.cmsQuery("bikes:profit", "Smoky Mountain Striker");
    System.out.println(res4); // >>> [100]

    Map<String, Object> res5 = jedis.cmsInfo("bikes:profit");
    System.out.println(res5.get("width") + " " + res5.get("depth") + " " + res5.get("count")); // >>> 2000 9 450

    jedis.close();
  }

}

CMS.INITBYPROB
Initializes a Count-Min Sketch to accommodate requested tolerances.
- cmsInitByProb(
  
  key: String,
  
  error: double,
  
  probability: double
  
  ) → String // OK
CMS.INCRBY
Increases the count of one or more items by increment
- cmsIncrBy(
  
  key: String,
  
  item: String,
  
  increment: long
  
  ) → long // Count for item after increment
- cmsIncrBy(
  
  key: String,
  
  itemIncrements: Map<String, Long>
  
  ) → List<Long> // Count of each item after increment
CMS.QUERY
Returns the count for one or more items in a sketch
- cmsQuery(
  
  key: String,
  
  items: String...
  
  ) → List<Long> // Count for one or more items
CMS.INFO
Returns information about a sketch
- cmsInfo(
  
  key: String
  
  ) → Map<String, Object> // Width, depth, and total count

Java-Sync Quick-Start

package example_commands_test

import (
	"context"
	"fmt"

	"github.com/redis/go-redis/v9"
)


func ExampleClient_cms() {
	ctx := context.Background()

	rdb := redis.NewClient(&redis.Options{
		Addr:     "localhost:6379",
		Password: "", // no password docs
		DB:       0,  // use default DB
	})


	res1, err := rdb.CMSInitByProb(ctx, "bikes:profit", 0.001, 0.002).Result()

	if err != nil {
		panic(err)
	}

	fmt.Println(res1) // >>> OK

	res2, err := rdb.CMSIncrBy(ctx, "bikes:profit",
		"Smoky Mountain Striker", 100,
	).Result()

	if err != nil {
		panic(err)
	}

	fmt.Println(res2) // >>> [100]

	res3, err := rdb.CMSIncrBy(ctx, "bikes:profit",
		"Rocky Mountain Racer", 200,
		"Cloudy City Cruiser", 150,
	).Result()

	if err != nil {
		panic(err)
	}

	fmt.Println(res3) // >>> [200 150]

	res4, err := rdb.CMSQuery(ctx, "bikes:profit",
		"Smoky Mountain Striker",
	).Result()

	if err != nil {
		panic(err)
	}

	fmt.Println(res4) // >>> [100]

	res5, err := rdb.CMSInfo(ctx, "bikes:profit").Result()

	if err != nil {
		panic(err)
	}

	fmt.Printf("Width: %v, Depth: %v, Count: %v",
		res5.Width, res5.Depth, res5.Count)
	// >>> Width: 2000, Depth: 9, Count: 450

}

CMS.INITBYPROB
Initializes a Count-Min Sketch to accommodate requested tolerances.
- CMSInitByProb(
  
  ctx: context.Context,
  
  key: string,
  
  errorRate: float64,
  
  probability: float64
  
  ) → *StatusCmd // Status command result
CMS.INCRBY
Increases the count of one or more items by increment
- CMSIncrBy(
  
  ctx: context.Context,
  
  key: string,
  
  elements: ...interface{} // Alternating items and increments
  
  ) → *IntSliceCmd // Counts after increment
CMS.QUERY
Returns the count for one or more items in a sketch
- CMSQuery(
  
  ctx: context.Context,
  
  key: string,
  
  elements: ...interface{}
  
  ) → *IntSliceCmd // Estimated counts
CMS.INFO
Returns information about a sketch
- CMSInfo(
  
  ctx: context.Context,
  
  key: string
  
  ) → *CMSInfoCmd // CMSInfo with width, depth, count

Go Quick-Start


using NRedisStack.CountMinSketch.DataTypes;
using NRedisStack.RedisStackCommands;
using NRedisStack.Tests;
using StackExchange.Redis;



public class CmsTutorial
{
    public void Run()
    {
        var muxer = ConnectionMultiplexer.Connect("localhost:6379");
        var db = muxer.GetDatabase();


        bool res1 = db.CMS().InitByProb("bikes:profit", 0.001, 0.002);
        Console.WriteLine(res1);    // >>> True

        long res2 = db.CMS().IncrBy("bikes:profit", "Smoky Mountain Striker", 100);
        Console.WriteLine(res2);    // >>> 100

        long[] res3 = db.CMS().IncrBy("bikes:profit",
            [
                new("Rocky Mountain Racer", 200),
                new("Cloudy City Cruiser", 150)
            ]
        );
        Console.WriteLine(string.Join(", ", res3)); // >>> 200, 150

        long[] res4 = db.CMS().Query("bikes:profit", new RedisValue[] { "Smoky Mountain Striker" });
        Console.WriteLine(string.Join(", ", res4)); // >>> 100

        CmsInformation res5 = db.CMS().Info("bikes:profit");
        Console.WriteLine($"Width: {res5.Width}, Depth: {res5.Depth}, Count: {res5.Count}");
        // >>> Width: 2000, Depth: 9, Count: 450

        // Tests for 'cms' step.


    }
}

CMS.INITBYPROB
Initializes a Count-Min Sketch to accommodate requested tolerances.
- InitByProb(
  
  key: RedisKey,
  
  error: double,
  
  probability: double
  
  ) → bool // True if created successfully
CMS.INCRBY
Increases the count of one or more items by increment
- IncrBy(
  
  key: RedisKey,
  
  item: RedisValue,
  
  increment: long
  
  ) → long // Count after increment
- IncrBy(
  
  key: RedisKey,
  
  itemIncrements: Tuple<RedisValue, long>[]
  
  ) → long[] // Counts after increment
CMS.QUERY
Returns the count for one or more items in a sketch
- Query(
  
  key: RedisKey,
  
  items: params RedisValue[]
  
  ) → long[] // Estimated counts
CMS.INFO
Returns information about a sketch
- Info(
  
  key: RedisKey
  
  ) → CmsInformation // Sketch information

C#-Sync Quick-Start

<?php

require 'vendor/autoload.php';

use Predis\Client as PredisClient;

class DtCmsTest
{
    public function testDtCms() {
        $r = new PredisClient([
            'scheme'   => 'tcp',
            'host'     => '127.0.0.1',
            'port'     => 6379,
            'password' => '',
            'database' => 0,
        ]);

        $res1 = $r->cmsinitbyprob('bikes:profit', 0.001, 0.002);
        echo $res1 . PHP_EOL;
        // >>> OK

        $res2 = $r->cmsincrby('bikes:profit', 'Smoky Mountain Striker', 100);
        echo json_encode($res2) . PHP_EOL;
        // >>> [100]

        $res3 = $r->cmsincrby(
            'bikes:profit',
            'Rocky Mountain Racer', 200,
            'Cloudy City Cruiser', 150
        );
        echo json_encode($res3) . PHP_EOL;
        // >>> [200,150]

        $res4 = $r->cmsquery('bikes:profit', 'Smoky Mountain Striker');
        echo json_encode($res4) . PHP_EOL;
        // >>> [100]

        $res5 = $r->cmsinfo('bikes:profit');
        echo $res5['width'] . ' ' . $res5['depth'] . ' ' . $res5['count'] . PHP_EOL;
        // >>> 2000 9 450
    }
}

CMS.INITBYPROB
Initializes a Count-Min Sketch to accommodate requested tolerances.
- cmsinitbyprob(
  
  key: string,
  
  error: float,
  
  probability: float
  
  ) → string // OK
CMS.INCRBY
Increases the count of one or more items by increment
- cmsincrby(
  
  key: string,
  
  itemsAndIncrements: mixed... // Alternating items and increments
  
  ) → array // Counts after increment
CMS.QUERY
Returns the count for one or more items in a sketch
- cmsquery(
  
  key: string,
  
  items: string...
  
  ) → array // Estimated counts
CMS.INFO
Returns information about a sketch
- cmsinfo(
  
  key: string
  
  ) → array // Sketch information

PHP Quick-Start

Example 1:

If we had a uniform distribution of 1000 elements where each has a count of around 500 the threshold would be 500:

threshold = error * total_count  = 0.001 * (1000*500) = 500

This shows that a CMS is maybe not the best data structure to count frequency of a uniformly distributed stream. Let's try decreasing the error to 0.01%:

threshold = error * total_count  = 0.0001 * (1000*500) = 100

This threshold looks more acceptable already, but it means we will need a bigger sketch width w = 2/error = 20 000 and consequently - more memory.

Example 2:

In another example let's imagine a normal (gaussian) distribution where we have 1000 elements, out of which 800 will have a summed count of 400K (with an average count of 500) and 200 elements will have a much higher summed count of 1.6M (with an average count of 8000), making them the heavy hitters (elephant flow). The threshold after "populating" the sketch with all the 1000 elements would be:

threshold = error * total_count = 0.001 * 2M = 2000

This threshold seems to be sitting comfortably between the 2 average counts 500 and 8000 so the initial chosen error rate should be working well for this case.

Sizing

Even though the Count-Min sketch is similar to Bloom filter in many ways, its sizing is considerably more complex. The initialisation command receives only two sizing parameters, but you have to understand them thoroughly if you want to have a usable sketch.

CMS.INITBYPROB key error probability

1. Error

The error parameter will determine the width w of your sketch and the probability will determine the number of hash functions (depth d). The error rate we choose will determine the threshold above which we can trust the result from the sketch. The correlation is:

threshold = error * total_count

error = threshold/total_count

where total_count is the sum of the count of all elements that can be obtained from the count key of the result of the CMS.INFO command and is of course dynamic - it changes with every new increment in the sketch. At creation time you can approximate the total_count ratio as a product of the average count you'll be expecting in the sketch and the average number of elements.

Since the threshold is a function of the total count in the filter it's very important to note that it will grow as the count grows, but knowing the total count we can always dynamically calculate the threshold. If a result is below it - it can be discarded.

2. Probability

probability in this data structure represents the chance of an element that has a count below the threshold to collide with elements that had a count above the threshold on all sketches/depths thus returning a min-count of a frequently occurring element instead of its own.

Performance

Adding, updating and querying for elements in a CMS has a time complexity O(1).

Academic sources

An Improved Data Stream Summary: The Count-Min Sketch and its Applications

References

Count-Min Sketch: The Art and Science of Estimating Stuff

Products

Tools

Get Redis

Connect

Learn

Latest

See how it works