What

A probabilistic data structure to count unique items.

Provides a very good approximation of the cardinality of a set even using a very small amount of memory. In the Redis implementation it only uses 12kbytes per key to count with a standard error of 0.81%, and there is no limit to the number of items you can count, unless you approach 2^64 items

How

  • b: Number of initial bits in a binary representation of a hash value
  • m: Number of registers / buckets (= )
  • p: MSB position of 1 (leftmost position)
  1. Hash a data, convert it to binary
  2. From the hashed value in binary form, b bits are extracted from the MSB.
    • This is the index of register (in binary)
  3. Find the leftmost 1 from the remaining set of bits (MSB)
    • p = the number of leading zeros + 1
  4. reg[b] = max(reg[b], p)
  5. image