Is swapping uint32 for uint64 benign other than (possibly) memory usage?

144 views Asked by At

Background

I recently found that md5 hashes on large R objects using the digest package did not change when making small changes. This appears to be due to some 32 bit counter variables getting overflowed and the algorithm missing the changed portion of the file.

Using the current development version of digest on Linux, hashes notice these small changes on large files whereas on Windows, these small changes get missed.

I made the following changes to the current dev version, which swaps a few unsigned long int (unit32) variables for unsigned long long int (uint64) variables:

https://github.com/eddelbuettel/digest/compare/master...kendonB:testmd5

and now on Windows the problem is fixed and the hashes notice the changes.

Question

Is swapping out these 32-bit integer variables for 64-bit integer variables benign? Will anything get ruined on 32-bit systems? On obscure systems? Can anything go wrong?

Further background

https://github.com/eddelbuettel/digest/issues/97

2

There are 2 answers

4
P.W On

On a 32-bit system, a 64-bit integer is usually implemented using two 32-bit registers. Operations on such an integer result in two instructions for load and store. For something like addition, add with carry is used. This is something the compiler takes care of.

You should only make sure that the compiler you are using supports such a type.

For example, the signed and unsigned versions of long long int (which should be at least 64 bits) were introduced in C99. So you should use a compiler has support for this feature of the C99 standard.

3
rici On

The MD5 hash of a string is a unique well-defined value, which is a vital feature of the MD5 hash. It allows it to be used for verification. (Although cryptographic use of MD5 has been deprecated for some time.)

So if a particular library produces the wrong hash value, that's a bug, and a pretty serious one, and it should be reported as such to the package author. There are reference implementations of the MD5 hash which can be used to obtain the correct hash value, but the md5sum command is highly likely to be correct as well, which might be a simpler check.

It is certainly possible that the bug in question, if you can verify that it is a bug, is the result of an unexpected 32-bit integer overflow. But modifying crypto libraries is not a casual acitvity, even buggy implementations of deprecated alogorithms. "It seems to work" is usually not an adequate algorithm validation. I would caution against using an unvalidated modification. But it's a useful hint for the library maintainer.