perfect hash function for random integer

527 views Asked by At

Here's the problem:

X is a positive integer (include 0) set which has n different elements I know in advance. All of them is less equal than m. And I want to have an occ-free hash function as simple as possible to map them to 0-n-1.

For example:

X = [31,223,121,100,123,71], so n = 6, m = 223.

I want to find a hash function to map them to [0, 1, 2, 3, 4, 5].

If mapping to 0-n-1 is too difficult, then how to mapping X to a small range is also a problem.

Finding such a function is not too difficult, but to be simple and easy to be generated is hard.

It's better to preserve the order of the X.

Any clues?

2

There are 2 answers

1
Matt Timmermans On

My favorite perfect hash is pretty easy.

The hash function you generate has the form:

hash = table1[h1(key)%N] + table2[h2(key)%N]

h1 and h2 are randomly generated hash functions. In your case, you can generate random constants and then have h1(key)=key*C1/m and h2(key)=key*C2/m or something similarly simple

To generated the perfect hash:

  1. Generate random constants C1 and C2
  2. Imagine the bipartite graph, with table1 slots and table2 slots as vertices and an edge for each key between table1[h1(key)%N] and table2[h2(key)%N]. Run a DFS to see if the graph is acyclic. If not, go back to step 1.
  3. Now that you have an acyclic graph, start at any key/edge in each connected component, and set its slots in table1 and table2 however you like to give it whatever hash you like.
  4. Traverse the tree starting at the vertices adjacent to the edge you just set. For every edge you traverse, one of its slots will already be set. Set the other one to make the hash value come out however you like.

That's it. All of steps (2), (3) and (4) can be combined into a single DFS traversal pretty easily.

The complete description and analysis is in this paper.

4
Thomas Mueller On

As the number of entries is small, you can use brute force. I found this (Java: long is 64 bit signed, int is 32 bit signed):

private static int hashReduce(long x, long seed, int n) {
    long x1 = ((long) x + seed);
    int x2 = (int) ((x1 >>> 32) ^ x1);
    int x3 = ((x2 >>> 16) ^ x2) * 0x45d9f3b;
    return (int) (((x3 & 0xffffffffL) * n) >>> 32);
}

public static void main(String... args) throws InterruptedException {
    long[] data = new long[] { 31, 223, 121, 100, 123, 71 };
    for (int j = 0; j < 6; j++) {
        System.out.println(data[j] + " -> " + hashReduce(data[j], -2126115507L, 6));
    } 
}

x: 31 -> 0
x: 223 -> 1
x: 121 -> 2
x: 100 -> 3
x: 123 -> 4
x: 71 -> 5

To find the seed value, one need to iterate, for example as follows:

for (int seed = Integer.MIN_VALUE; seed < Integer.MAX_VALUE; seed++) {
    testWithSeed(seed);
}

where testWithSeed is user defined and verifies that the data is mapped as expected.