Roaring
Introduction
Bitsets, also called bitmaps, are commonly used as fast data structures. Unfortunately, they can use too much memory. To compensate, we often use compressed bitmaps.
Roaring bitmaps are compressed bitmaps which tend to outperform conventional compressed bitmaps such as WAH, EWAH or Concise. In some instances, roaring bitmaps can be hundreds of times faster and they often offer significantly better compression. They can even be faster than uncompressed bitmaps.
Implementation
You can work with Roaring bitmaps in YDB using a set of user-defined functions (UDFs) in the Roaring
module. These functions provide the ability to work with 32-bit Roaring bitmaps. To do this, the data must be serialized in the format for 32-bit bitmaps described in the specification. This can be done using a function available in the Roaring bitmap library itself.
Such libraries exist for many programming languages, such as Go. If the serialization happened on the client side, the application can then save the serialized bitmap in a column with the String
type.
To work with Roaring bitmaps in a query, data from the String
type must be deserialized into the Resource<roaring_bitmap> type. To save the data, you need to perform the reverse operation. After that, the application can read the updated bitmap from YDB and deserialize it.
Available methods
Roaring::Deserialize(String{Flags:AutoMap})->Resource<roaring_bitmap>
Roaring::FromUint32List(List<Uint32>{Flags:AutoMap})->Resource<roaring_bitmap>
Roaring::Serialize(Resource<roaring_bitmap>{Flags:AutoMap})->String
Roaring::Uint32List(Resource<roaring_bitmap>{Flags:AutoMap})->List<Uint32>
Roaring::Cardinality(Resource<roaring_bitmap>{Flags:AutoMap})->Uint32
Roaring::Or(Resource<roaring_bitmap>{Flags:AutoMap}, Resource<roaring_bitmap>{Flags:AutoMap})->Resource<roaring_bitmap>
Roaring::OrWithBinary(Resource<roaring_bitmap>{Flags:AutoMap}, String{Flags:AutoMap})->Resource<roaring_bitmap>
Roaring::And(Resource<roaring_bitmap>{Flags:AutoMap}, Resource<roaring_bitmap>{Flags:AutoMap})->Resource<roaring_bitmap>
Roaring::AndWithBinary(Resource<roaring_bitmap>{Flags:AutoMap}, String{Flags:AutoMap})->Resource<roaring_bitmap>
Roaring::AndNot(Resource<roaring_bitmap>{Flags:AutoMap}, Resource<roaring_bitmap>{Flags:AutoMap})->Resource<roaring_bitmap>
Roaring::AndNotWithBinary(Resource<roaring_bitmap>{Flags:AutoMap}, String{Flags:AutoMap})->Resource<roaring_bitmap>
Roaring::RunOptimize(Resource<roaring_bitmap>{Flags:AutoMap})->Resource<roaring_bitmap>
Serialization and deserialization
Two functions, Deserialize
and FromUint32List
, are available for creating Resource<roaring_bitmap>
. The second function allows creating a Roaring bitmap from a list of unsigned integers, i.e., without the need to use the Roaring bitmap library code to create a binary representation.
YDB does not store data with the Resource
type, so the created bitmap must be converted to a binary representation using the Serialize
method.
To use the resulting bitmap, for example, in a WHERE
condition, the Uint32List
method is provided. This method returns a list of unsigned integers from the Resource<roaring_bitmap>
.
Bitwise operations
Currently, three modifying binary operations with bitmaps are supported:
Or
And
AndNot
The operations are modifying, meaning that they modify the Resource<roaring_bitmap>
passed as the first argument. Each of these operations has a version with the WithBinary
suffix, which allows working with the binary representation without having to deserialize it into the Resource<roaring_bitmap>
type. The implementation of these methods still has to deserialize the data to perform the operation, but it does not create an intermediate Resource
, thereby saving resources.
Other operations
The Cardinality
function is provided to obtain the number of bits set to 1 in the Resource<roaring_bitmap>
.
After the bitmap has been constructed or modified, it can be optimized using the RunOptimize
method. The internal format of a Roaring bitmap can use containers with better representations for different bit sequences.
Examples
$b = Roaring::FromUint32List(AsList(42));
$b = Roaring::Or($b, Roaring::FromUint32List(AsList(56)));
SELECT Roaring::Uint32List($b) AS `Or`; -- [42, 56]
$b1 = Roaring::FromUint32List(AsList(10, 567, 42));
$b2 = Roaring::FromUint32List(AsList(42));
$b2ser = Roaring::Serialize($b2); -- save this to String column
SELECT Roaring::Cardinality(Roaring::AndWithBinary($b1, $b2ser)) AS Cardinality; -- 1
SELECT Roaring::Uint32List(Roaring::And($b1, $b2)) AS `And`; -- [42]
SELECT Roaring::Uint32List(Roaring::AndWithBinary($b1, $b2ser)) AS AndWithBinary; -- [42]
$b1 = Roaring::FromUint32List(AsList(10, 567, 42));
$b2 = Roaring::FromUint32List(AsList(42));
$b2ser = Roaring::Serialize($b2); -- save this to String column
SELECT Roaring::Cardinality(Roaring::AndNotWithBinary($b1, $b2ser)) AS Cardinality; -- 2
SELECT Roaring::Uint32List(Roaring::AndNot($b1, $b2)) AS AndNot; -- [10,567]
SELECT Roaring::Uint32List(Roaring::AndNotWithBinary($b1, $b2ser)) AS AndNotWithBinary; -- [10,567]