I have 2 or more HLLs that are unioned, I want to get the intersection count of that unions. I have used the example from here hll-python example Following is my code
ops = [hll_ops.hll_get_union(HLL_BIN, records)]
_, _, result1 = client.operate(getKey(value), ops)
ops = [hll_ops.hll_get_union(HLL_BIN, records2)]
_, _, result2 = client.operate(getKey(value2), ops)
ops = [hll_ops.hll_get_intersect_count(HLL_BIN, [result1[HLL_BIN]] + [result2[HLL_BIN]])]
_, _, resultVal = client.operate(getKey(value), ops)
print(f'intersectAll={resultVal}')
_, _, resultVal2 = client.operate(getKey(value2), ops)
print(f'intersectAll={resultVal2}')
I get 2 different results when I use different keys for the intersection using hll_get_intersect_count, i.e resultVal and resultVal2 are not same. This does not happen in the case of union count using function hll_get_union_count. Ideally the value of intersection should be the same.
Can any one tell me why is this happening and what is the right way to do it?
Was able to figure out the solutions (with the help of Aerospike support, the same question was posted here and discussed more elaboratively aerospike forum).
Posting my code for others having the same issue.
Intersection of HLLs is not supported in Aerospike. However, If I am to get intersection of multiple HLLs I will have to save one union into aerospike and then get intersection count of one vs the rest of the union. The key we provide in
client.operatefunction forhll_get_intersect_countis used to get the intersection with the union.Following is the code I came up with
For more reference, you can look here for hll_set_union reference.
More elaborate discussion can be found here