Cuckoo filter

Cuckoo filters are a probabilistic data structure that checks for presence of an element in a set

Cuckoo filter command summary (view reference, 12 commands)

A Cuckoo filter, just like a Bloom filter, is a probabilistic data structure in Redis Open Source that enables you to check if an element is present in a set in a very fast and space efficient way, while also allowing for deletions and showing better performance than Bloom in some scenarios.

While the Bloom filter is a bit array with flipped bits at positions decided by the hash function, a Cuckoo filter is an array of buckets, storing fingerprints of the values in one of the buckets at positions decided by the two hash functions. A membership query for item x searches the possible buckets for the fingerprint of x, and returns true if an identical fingerprint is found. A cuckoo filter's fingerprint size will directly determine the false positive rate.

Use cases

Targeted ad campaigns (advertising, retail)

This application answers this question: Has the user signed up for this campaign yet?

Use a Cuckoo filter for every campaign, populated with targeted users' ids. On every visit, the user id is checked against one of the Cuckoo filters.

If yes, the user has not signed up for campaign. Show the ad.
If the user clicks ad and signs up, remove the user id from that Cuckoo filter.
If no, the user has signed up for that campaign. Try the next ad/Cuckoo filter.

Discount code/coupon validation (retail, online shops)

This application answers this question: Has this discount code/coupon been used yet?

Use a Cuckoo filter populated with all discount codes/coupons. On every try, the entered code is checked against the filter.

If no, the coupon is not valid.
If yes, the coupon can be valid. Check the main database. If valid, remove from Cuckoo filter as used.

Note> In addition to these two cases, Cuckoo filters serve very well all the Bloom filter use cases.

Examples

You'll learn how to create an empty cuckoo filter with an initial capacity for 1,000 items, add items, check their existence, and remove them. Even though the CF.ADD command can create a new filter if one isn't present, it might not be optimally sized for your needs. It's better to use the CF.RESERVE command to set up a filter with your preferred capacity.

Language:

Cuckoo filter operations: Use CF.RESERVE to create a filter, CF.ADD to add items, CF.EXISTS to check membership, and CF.DEL to remove items when you need space-efficient probabilistic set membership testing with deletion support

> CF.RESERVE bikes:models 1000
OK
> CF.ADD bikes:models "Smoky Mountain Striker"
(integer) 1
> CF.EXISTS bikes:models "Smoky Mountain Striker"
(integer) 1
> CF.EXISTS bikes:models "Terrible Bike Name"
(integer) 0
> CF.DEL bikes:models "Smoky Mountain Striker"
(integer) 1

CF.RESERVE
Creates a new Cuckoo Filter
CF.ADD
Adds an item to a Cuckoo Filter
CF.EXISTS
Checks whether one or more items exist in a Cuckoo Filter
CF.DEL
Deletes an item from a Cuckoo Filter

Redis CLI guide

Also, check out our other client tools Redis Insight and Redis for VS Code.

res1 = r.cf().reserve("bikes:models", 1000000)
print(res1)  # >>> True

res2 = r.cf().add("bikes:models", "Smoky Mountain Striker")
print(res2)  # >>> 1

res3 = r.cf().exists("bikes:models", "Smoky Mountain Striker")
print(res3)  # >>> 1

res4 = r.cf().exists("bikes:models", "Terrible Bike Name")
print(res4)  # >>> 0

res5 = r.cf().delete("bikes:models", "Smoky Mountain Striker")
print(res5)  # >>> 1

CF.RESERVE
Creates a new Cuckoo Filter
- create(
  
  key: str, // The name of the Cuckoo filter
  
  capacity: int, // Number of entries intended to be added
  
  expansion: int, // Optional expansion rate
  
  bucket_size: int, // Optional bucket size
  
  max_iterations: int // Optional max iterations
  
  ) → str // OK on success
- reserve(
  
  key: str, // The name of the Cuckoo filter
  
  capacity: int, // Number of entries intended to be added
  
  expansion: int, // Optional expansion rate
  
  bucket_size: int, // Optional bucket size
  
  max_iterations: int // Optional max iterations
  
  ) → str // OK on success (alias for create)
CF.ADD
Adds an item to a Cuckoo Filter
- add(
  
  key: str, // The name of the Cuckoo filter
  
  item: str // The item to add
  
  ) → bool // True on success
CF.EXISTS
Checks whether one or more items exist in a Cuckoo Filter
- exists(
  
  key: str, // The name of the Cuckoo filter
  
  item: str // The item to check
  
  ) → bool // True if item may exist, False if certainly doesn't
CF.DEL
Deletes an item from a Cuckoo Filter
- delete(
  
  key: str, // The name of the Cuckoo filter
  
  item: str // The item to delete
  
  ) → bool // True if deleted, False if not found

Python Quick-Start

const res1 = await client.cf.reserve('bikes:models', 1000000);
console.log(res1);  // >>> OK

const res2 = await client.cf.add('bikes:models', 'Smoky Mountain Striker');
console.log(res2);  // >>> true

const res3 = await client.cf.exists('bikes:models', 'Smoky Mountain Striker');
console.log(res3);  // >>> true

const res4 = await client.cf.exists('bikes:models', 'Terrible Bike Name');
console.log(res4);  // >>> false

const res5 = await client.cf.del('bikes:models', 'Smoky Mountain Striker');
console.log(res5);  // >>> true

CF.RESERVE
Creates a new Cuckoo Filter
- CF.RESERVE(
  
  key: RedisArgument, // The name of the Cuckoo filter
  
  capacity: number, // Initial capacity
  
  options: CfReserveOptions // Optional: BUCKETSIZE, MAXITERATIONS, EXPANSION
  
  ) → SimpleStringReply<'OK'> // OK on success
CF.ADD
Adds an item to a Cuckoo Filter
- CF.ADD(
  
  key: RedisArgument, // The name of the Cuckoo filter
  
  item: RedisArgument // The item to add
  
  ) → boolean // true on success
CF.EXISTS
Checks whether one or more items exist in a Cuckoo Filter
- CF.EXISTS(
  
  key: RedisArgument, // The name of the Cuckoo filter
  
  item: RedisArgument // The item to check
  
  ) → boolean // true if item may exist
CF.DEL
Deletes an item from a Cuckoo Filter
- CF.DEL(
  
  key: RedisArgument, // The name of the Cuckoo filter
  
  item: RedisArgument // The item to delete
  
  ) → boolean // true if deleted, false if not found

Node.js Quick-Start

        String res1 = jedis.cfReserve("bikes:models", 1000000);
        System.out.println(res1); // >>> OK


        boolean res2 = jedis.cfAdd("bikes:models", "Smoky Mountain Striker");
        System.out.println(res2); // >>> True

        boolean res3 = jedis.cfExists("bikes:models", "Smoky Mountain Striker");
        System.out.println(res3); // >>> True

        boolean res4 = jedis.cfExists("bikes:models", "Terrible Bike Name");
        System.out.println(res4); // >>> False

        boolean res5 = jedis.cfDel("bikes:models", "Smoky Mountain Striker");
        System.out.println(res5); // >>> True

CF.RESERVE
Creates a new Cuckoo Filter
- cfReserve(
  
  key: String, // The name of the Cuckoo filter
  
  capacity: long // Initial capacity
  
  ) → String // OK on success
- cfReserve(
  
  key: String, // The name of the Cuckoo filter
  
  capacity: long, // Initial capacity
  
  reserveParams: CFReserveParams // Reserve parameters (bucketSize, maxIterations, expansion)
  
  ) → String // OK on success
CF.ADD
Adds an item to a Cuckoo Filter
- cfAdd(
  
  key: String, // The name of the Cuckoo filter
  
  item: String // The item to add
  
  ) → boolean // true on success
CF.EXISTS
Checks whether one or more items exist in a Cuckoo Filter
- cfExists(
  
  key: String, // The name of the Cuckoo filter
  
  item: String // The item to check
  
  ) → boolean // true if item may exist
CF.DEL
Deletes an item from a Cuckoo Filter
- cfDel(
  
  key: String, // The name of the Cuckoo filter
  
  item: String // The item to delete
  
  ) → boolean // true if deleted, false if not found

Java-Sync Quick-Start

	res1, err := rdb.CFReserve(ctx, "bikes:models", 1000000).Result()

	if err != nil {
		panic(err)
	}

	fmt.Println(res1) // >>> OK

	res2, err := rdb.CFAdd(ctx, "bikes:models", "Smoky Mountain Striker").Result()

	if err != nil {
		panic(err)
	}

	fmt.Println(res2) // >>> true

	res3, err := rdb.CFExists(ctx, "bikes:models", "Smoky Mountain Striker").Result()

	if err != nil {
		panic(err)
	}

	fmt.Println(res3) // >>> true

	res4, err := rdb.CFExists(ctx, "bikes:models", "Terrible Bike Name").Result()

	if err != nil {
		panic(err)
	}

	fmt.Println(res4) // >>> false

	res5, err := rdb.CFDel(ctx, "bikes:models", "Smoky Mountain Striker").Result()

	if err != nil {
		panic(err)
	}

	fmt.Println(res5) // >>> true

CF.RESERVE
Creates a new Cuckoo Filter
- CFReserve(
  
  ctx: context.Context, // Context
  
  key: string, // The name of the Cuckoo filter
  
  capacity: int64 // Initial capacity
  
  ) → *StatusCmd // Status command result
- CFReserveWithArgs(
  
  ctx: context.Context, // Context
  
  key: string, // The name of the Cuckoo filter
  
  options: *CFReserveOptions // Reserve options (Capacity, BucketSize, MaxIterations, Expansion)
  
  ) → *StatusCmd // Status command result
CF.ADD
Adds an item to a Cuckoo Filter
- CFAdd(
  
  ctx: context.Context, // Context
  
  key: string, // The name of the Cuckoo filter
  
  element: interface{} // The item to add
  
  ) → *BoolCmd // Boolean command result
CF.EXISTS
Checks whether one or more items exist in a Cuckoo Filter
- CFExists(
  
  ctx: context.Context, // Context
  
  key: string, // The name of the Cuckoo filter
  
  element: interface{} // The item to check
  
  ) → *BoolCmd // Boolean command result
CF.DEL
Deletes an item from a Cuckoo Filter
- CFDel(
  
  ctx: context.Context, // Context
  
  key: string, // The name of the Cuckoo filter
  
  element: interface{} // The item to delete
  
  ) → *BoolCmd // Boolean command result

Go Quick-Start

        bool res1 = db.CF().Reserve("bikes:models", 1000000);
        Console.WriteLine(res1);    // >>> True

        bool res2 = db.CF().Add("bikes:models", "Smoky Mountain Striker");
        Console.WriteLine(res2);    // >>> True

        bool res3 = db.CF().Exists("bikes:models", "Smoky Mountain Striker");
        Console.WriteLine(res3);    // >>> True

        bool res4 = db.CF().Exists("bikes:models", "Terrible Bike Name");
        Console.WriteLine(res4);    // >>> False

        bool res5 = db.CF().Del("bikes:models", "Smoky Mountain Striker");
        Console.WriteLine(res5);    // >>> True

CF.RESERVE
Creates a new Cuckoo Filter
- Reserve(
  
  key: RedisKey, // The name of the Cuckoo filter
  
  capacity: long, // Initial capacity
  
  bucketSize: long?, // Optional bucket size
  
  maxIterations: int?, // Optional max iterations
  
  expansion: int? // Optional expansion rate
  
  ) → bool // true on success
CF.ADD
Adds an item to a Cuckoo Filter
- Add(
  
  key: RedisKey, // The name of the Cuckoo filter
  
  item: RedisValue // The item to add
  
  ) → bool // true on success
CF.EXISTS
Checks whether one or more items exist in a Cuckoo Filter
- Exists(
  
  key: RedisKey, // The name of the Cuckoo filter
  
  item: RedisValue // The item to check
  
  ) → bool // true if item may exist
CF.DEL
Deletes an item from a Cuckoo Filter
- Del(
  
  key: RedisKey, // The name of the Cuckoo filter
  
  item: RedisValue // The item to delete
  
  ) → bool // true if deleted, false if not found

C#-Sync (NRedisStack) Quick-Start

        $res1 = $r->cfreserve('bikes:models', 1000000);
        echo $res1 . PHP_EOL;
        // >>> OK

        $res2 = $r->cfadd('bikes:models', 'Smoky Mountain Striker');
        echo $res2 . PHP_EOL;
        // >>> 1

        $res3 = $r->cfexists('bikes:models', 'Smoky Mountain Striker');
        echo $res3 . PHP_EOL;
        // >>> 1

        $res4 = $r->cfexists('bikes:models', 'Terrible Bike Name');
        echo $res4 . PHP_EOL;
        // >>> 0

        $res5 = $r->cfdel('bikes:models', 'Smoky Mountain Striker');
        echo $res5 . PHP_EOL;
        // >>> 1

CF.RESERVE
Creates a new Cuckoo Filter
- cfreserve(
  
  $key: string, // The name of the Cuckoo filter
  
  $capacity: int, // Initial capacity
  
  $bucketSize: int, // Optional bucket size (-1 for default)
  
  $maxIterations: int, // Optional max iterations (-1 for default)
  
  $expansion: int // Optional expansion rate (-1 for default)
  
  ) → Status // OK on success
CF.ADD
Adds an item to a Cuckoo Filter
- cfadd(
  
  $key: string, // The name of the Cuckoo filter
  
  $item: mixed // The item to add
  
  ) → mixed // Result of the operation
CF.EXISTS
Checks whether one or more items exist in a Cuckoo Filter
- cfexists(
  
  $key: string, // The name of the Cuckoo filter
  
  $item: mixed // The item to check
  
  ) → mixed // Result of the operation
CF.DEL
Deletes an item from a Cuckoo Filter
- cfdel(
  
  $key: string, // The name of the Cuckoo filter
  
  $item: mixed // The item to delete
  
  ) → mixed // Result of the operation

PHP Quick-Start

Bloom vs. Cuckoo filters

Bloom filters typically exhibit better performance and scalability when inserting items (so if you're often adding items to your dataset, then a Bloom filter may be ideal). Cuckoo filters are quicker on check operations and also allow deletions.

Sizing Cuckoo filters

These are the main parameters and features of a cuckoo filter:

p target false positive rate
f fingerprint length in bits
α fill rate or load factor (0≤α≤1)
b number of entries per bucket
m number of buckets
n number of items
C average bits per item

Let's start by remembering that a cuckoo filter bucket can have multiple entries (where each entry stores one fingerprint). If we end up having all entries occupied with a fingerprint then we won't have empty slots to save new elements and the filter will be declared full, that's why we should always maintain a certain percentage of our cuckoo filter free.
As a result of this the "real" memory cost of an item should include that overhead in addition to the fingerprint size. If α is the load factor (fingerprint size / total filter size) and f is the number of bits in an entry the amortised space cost f/α bits.

When you initialise a new filter you are asked to choose its capacity and bucket size.

CF.RESERVE {key} {capacity} [BUCKETSIZE bucketSize] [MAXITERATIONS maxIterations]
[EXPANSION expansion]

Choosing the capacity (`capacity`)

The capacity of a Cuckoo filter is calculated as

capacity = n*f/α

where n is the number of elements you expect to have in your filter, f is the fingerprint length in bits which is set to 8 and α is the fill factor. So in order to get your filter capacity you must first choose a fill factor. The fill factor will determine the density of your data and of course the memory. The capacity will be rounded up to the next "power of two (2ⁿ)" number.

Please note that inserting repeated items in a cuckoo filter will try to add them multiple times causing your filter to fill up

Because of how Cuckoo Filters work, the filter is likely to declare itself full before capacity is reached and therefore fill rate will likely never reach 100%.

Choosing the bucket size (`BUCKETSIZE`)

Number of items in each bucket. A higher bucket size value improves the fill rate but also causes a higher error rate and slightly slower performance.

error_rate = (buckets * hash_functions)/2^fingerprint_size = (buckets*2)/256

When bucket size of 1 is used the fill rate is 55% and false positive error rate is 2/256 ≈ 0.78% which is the minimal false positive rate you can achieve. Larger buckets increase the error rate linearly but improve the fill rate of the filter. For example, a bucket size of 3 yields a 2.34% error rate and an 80% fill rate. Bucket size of 4 yields a 3.12% error rate and a 95% fill rate.

Choosing the scaling factor (`EXPANSION`)

When the filter self-declares itself full, it will auto-expand by generating additional sub-filters at the cost of reduced performance and increased error rate. The new sub-filter is created with size of the previous sub-filter multiplied by EXPANSION (chosen on filter creation). Like bucket size, additional sub-filters grow the error rate linearly (the compound error is a sum of all subfilters' errors). The size of the new sub-filter is the size of the last sub-filter multiplied by expansion and this is something very important to keep in mind. If you know you'll have to scale at some point it's better to choose a higher expansion value. The default is cf-expansion-factor.

Maybe you're wondering "Why would I create a smaller filter with a high expansion rate if I know I'm going to scale anyway?"; the answer is: for cases where you need to keep many filters (let's say a filter per user, or per product) and most of them will stay small, but some with more activity will have to scale.

The expansion factor will be rounded up to the next "power of two (2ⁿ)" number.

Choosing the maximum number of iterations (`MAXITERATIONS`)

MAXITERATIONS dictates the number of attempts to find a slot for the incoming fingerprint. Once the filter gets full, a high MAXITERATIONS value will slow down insertions. The default value is cf-max-iterations.

Interesting facts:

Unused capacity in prior sub-filters is automatically used when possible.
The filter can grow up to cf-max-expansions times.
You can delete items to stay within filter limits instead of rebuilding
Adding the same element multiple times will create multiple entries, thus filling up your filter.

Performance

Adding an element to a Cuckoo filter has a time complexity of O(1).

Similarly, checking for an element and deleting an element also has a time complexity of O(1).

Academic sources

Cuckoo Filter: Practically Better Than Bloom