Generate same hashcode for vectors that have jaccard similarity above a certain threshold

Question

Generate same hashcode for vectors that have jaccard similarity above a certain threshold

145 views Asked by dydy At 24 October 2022 at 18:11

Modifying the hashCode() method in java such that vectors can generate same hashcode for vectors that have jaccard similarity above a certain threshold with good accuracy

example:

vector 1: [1,1,0,0,1,0] vector 2: [1,1,0,0,0,0]

they have jaccard similarity of: 0.5

How can i modify the hashCode() method in Java such that vectors that have a similarity of 0.5 and above can go into the same bucket/or same hashcode?

Note: I am not doing it the minhash lsh and candidate pair way. It has to generate the hashcode just with vector itself

The goal is not to do it perfectly(which is impossible), but to do it as accurately as possible.

There will be situation where vector A and B, B and C can go together while A and C couldn't. The hashing function has to map it to either A with B, or B with C, or just A,B and C together

Original Q&A

There are 1 answers

**Jim Garrison** · Answer 1 · 2022-10-24T18:19:08+00:00

This is impossible. Jaccard similarity is calculated among two or more vectors, while the hash code must be dependent only on the contents of a single vector.

You can easily construct three vectors A, B and C such that (A,B) and (B,C) satisfy your criteria, meaning all three generate the same hash code, but (A,C) does not.

TechQA.

Generate same hashcode for vectors that have jaccard similarity above a certain threshold

There are 1 answers

Related Questions in JAVA

Related Questions in HASH

Related Questions in HASHCODE

Related Questions in LOCALITY-SENSITIVE-HASH

Popular Questions

Trending Questions