[I tried to compute a float multiplication, I observed the value was getting saturated to 65536 and was not updating.
the issue is only with the below code.]1
I tried this with online GCC compiler the issue was still the same.
does this have anything to do with float precision ? is compiler optimizing my float precision during operation?
is there any compiler flags that I can add to overcome this issue?
can anyone please guide me on how to solve this issue?
Attaching the code for reference
#include <stdio.h>
int main()
{
float dummy1, dummy2;
unsigned int i =0;
printf("Hello World");
printf("size of float = %ld\n", sizeof(dummy1));
dummy2 = 0.0;
dummy1 =65535.5;
dummy2 = 60.00 * 0.00005;
for( i= 0; i< 300; i++)
{
dummy1 = dummy1 + dummy2;
printf("dummy1 = %f %f\n", dummy1, dummy2);
}
return 0;
};
(This answers presumes IEEE-754 single and double precision binary formats are used for
floatanddouble.)60.00 * 0.00005is computed withdoublearithmetic and produces 0.003000000000000000062450045135165055398829281330108642578125. When this is stored indummy2, it is converted to 0.0030000000260770320892333984375.In the loop,
dummy1eventually reaches the value 65535.99609375. Then, whendummy1anddummy2are added, the result computed with real-number arithmetic would be 65535.9990000000260770320892333984375. This value is not representable in thefloatformat, so it is rounded to the nearest value representable in thefloatformat, and that is the result that the+operator produces.The nearest representable values in the float format are 65535.99609375 and 65536. Since 65536 is closer to 65535.9990000000260770320892333984375, it is the result.
In the next iteration, 65536 and 0.0030000000260770320892333984375 are added. The real-arithmetic result would be 65536.0030000000260770320892333984375. This is also not representable in
float. The nearest representable values are 65536 and 65536.0078125. Again 65536 is closer, so it is the computed result.From then on, the loop always produces 65536 as a result.
You can get better results either by using
doublearithmetic or by computingdummy1afresh in each iteration instead of accumulating rounding errors from iteration to iteration:Note that because
dummy1is afloat, it does not have the precision required to distinguish some successive values of the sequence. For example, output of the above includes: