Two successive calls to strcmp return different values given the same inputs

162 views Asked by At

I'm using strcmp to compare several strings, usually on the order of hundreds or thousands of comparisons per program run.

I was noticing that, occasionally, strcmp would return non-zero when I knew that the strings were exactly the same. This would happen only once every few runs, and not predictably. Confused, I set a breakpoint when the return value of strcmp was non-zero using GDB so I could compare the strings manually immediately after the call to strcmp. Sure enough, it hit the breakpoint and I manually printed out both strings in GDB. They were exactly the same, including the null terminating character.

Even more confused, I modified the code to the following, to try to pinpoint problem by running strcmp twice on the exact same strings. In this case, I know that the name parameter should always match R.ecos[0].name:

Eco *get_eco(const char *name)
{
    int r1 = strcmp(R.ecos[0].name, name);
    int r2 = strcmp(R.ecos[0].name, name);
    if (r1) {
        //break here with GDB
    } else {
        return &R.ecos[0];
    }
    return 0;
}

Again I run the program several times to try to produce the error. When the breakpoint hits, I print both r1 and r2 and see that r1 = 1 and r2 = 0.

My experience leads me to discount the possibility of a compiler bug or a bug within strcmp itself. This code is part of a game engine, so other threads are running for FMOD (audio) and GLFW3 (input), but none of these touch the memory here, so I have no reason to think there is something fishy going on with multithreading/race-conditions. What else could cause such a strange bug?

I'm using mingw32 on Windows, compiled with gcc, and I've tried two different versions of gcc and gdb, both producing the same results. This has NOT occurred on my Linux build, though due to the sporadic nature of the bug, I can't confirm with certainty that it isn't present on Linux, however I cannot reproduce it after dozens of runs on that OS.

Compiler flags:

-static -ggdb -g -O0 -std=c11 -Wall -Wextra -pedantic -Wshadow -Wpointer-arith \
                    -Wcast-align -Wwrite-strings -Wmissing-prototypes \
                    -Wmissing-declarations -Wredundant-decls -Wnested-externs \
                    -Winline -Wno-long-long -Wuninitialized \
                    -Wstrict-prototypes

EDIT: I've also run the program with ASAN on Linux and not encountered an errors

EDIT 2: After setting a hardware watchpoint on the address of both strings, e.g. watch -l R.ecos[0].name[32] I still hit the breakpoint in the code WITHOUT triggering the hardware watchpoints at all.

1

There are 1 answers

8
Employed Russian On

My experience leads me to discount the possibility of a compiler bug or a bug within strcmp itself.

There used to be a bug in GLIBC strstr long time ago, such bugs do sometimes exist. But you are correct in that this is very unlikely.

I would suggest making local copies of the strings before strcmp. That should provide a definitive answer as to whether you have a data race or a bug in strcmp.

It's fairly easy to do if there is a reasonable length limit:

Eco *get_eco(const char *name)
{
    char s1[MAXLEN], s2[MAXLEN];
    strcpy(s1, R.ecos[0].name);
    strcpy(s2, name);
    int r = strcmp(s1, s2);
    if (r) {
       // break here, examine s1, s2, name and R.ecos[0].name
    } else ...