Why does Java HashMap treeify() compare hash values of nodes in the same bucket?

90 views Asked by At

I'm currently studying the internals of the HashMap implementation in Java. I noticed something that I can't quite wrap my head around in the treeify() method. Here's the portion of the treeify() method that I'm referring to:

if ((ph = p.hash) > h)
    dir = -1;
else if (ph < h)
    dir = 1;

My understanding is that items within the same bucket in a HashMap tend to have the same internal hash code as this is what was used to assign them to that bucket in the first place. If that's the case, then these lines of code which compare the internal hash codes of nodes within the same bucket would never evaluate to true.

So my question is: Why does treeify() need to compare these hash codes? Are there any scenarios where items in the same bucket can have different internal hash codes? In other words, are there any situations where the above condition can be true? If not, wouldn't comparing internal hash codes be unnecessary?

In the end, treeify() method primarily uses Comparable interface and tieBreakOrder() method for sorting the items within the tree, so it seems strange to have lines of code for comparing what appear to be identical hash values. Any insights into this would greatly be appreciated. Thanks in advance!

1

There are 1 answers

9
Thomas Kläger On

The bucket is selected using not using the hashcode (*) itself but using the hashcode modulo number of buckets. For example in putVal() the bucket number i is calculated as (n - 1) & hash:

    if ((p = tab[i = (n - 1) & hash]) == null)
       tab[i] = newNode(hash, key, value, null);

For the default starting number of buckets of 16 keys with hashcodes of 0, 16, 32, and so on all end up in the same bucket 0.

Since treeifying starts when there are many entries in the same bucket and those entries can have different hashcodes it makes sense to build the tree by sorting by hashcode first before starting to compare (which may take longer, for example for long strings).


*) Actually it doesn't use the result of calling key.hashCode() directly, it uses the following function:

 static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

In this function for a non-null key h is assigned the keys hashcode and then the upper 16 bits are xored into the lower 16 bits which modifies the lower 16 bits of the result but leaves the upper 16 bits intact.

Thus the values of hash in the first code snippet contains still 2^32 distinct values, which is still larger then the size of the bucket array which still means that more than one hashcode value maps to the same bucket.