hashmap not recognizing identical keys

64 views Asked by At

I've been trying to use a hashmap for a while now, using code that my instructor provided, but it just simply cannot recognize two identical characters as being the same.

I'm really not sure what more I can say about it-I've tried modifying my code and the instructors, tried various different ways of sending the parameters, all sorts of other things that I can't quite remember off of the top of my head. It just doesn't seem to recognize that 'a' = 'a'.

My relevant code:

#include <stdio.h>
#include <string.h>
#include "hash.h"

int main(int argc, char* argv[]) {
    int len = strlen(argv[1]);

    hashmap map = hash_init();

    for (int i = 0; i < len; i++) {
        char cha = argv[1][i];

        put(&map, cha, 1);
    }

    char* mapStr = to_str(map);
    printf("%s", mapStr);

    return 0;
}

my instructor's relevant code:

hashmap.c

#include "hash.h"

hashmap hash_init(void) {
    hashmap h = {(map_node **)malloc(BINS * sizeof(map_node *)), 0, BINS};
    for (int i = 0; i < BINS; i++)
        h.table[i] = NULL;
    return h;
}

static int hash(char *key, int bins) {
    unsigned hashval = 0;
    for (int i = 0; i < strlen(key); i++)
        hashval = 31 * hashval + key[i];
    return hashval % bins;
}

int put(hashmap *h, char *key, int value) {
    int index = hash(key, h->bins);
    for (map_node *iterator = h->table[index]; iterator; iterator = iterator->next)
        if (!strcmp(iterator->key, key)){// I found the key
            int old_val = iterator->value;
            iterator->value = value;
            return old_val;//return old value
        }
    if (h->size >= h->bins)//load factor >= 100%
        rehash(h);
    map_node *new_element = (map_node *)malloc(sizeof(map_node));
    new_element->next = h->table[index];
    new_element->key = strdup(key);
    new_element->value = value;
    h->table[index] = new_element;
    h->size++;
    return 0;//return old value
}

char* to_str(hashmap h){
    char* rv = (char*) malloc(h.size * 100 + 1);
    for(int i = 0; i < h.bins;i++)
        for (map_node *it = h.table[i]; it; it = it->next)
            sprintf(rv, "%s\n%s: %d", rv, it->key, it->value);
    return rv;
}
Input: abcba
expected output: a:2 b:2 c:1
actual output: b:1 b:1 c:1 a:1 a:1

Thank you for at least reading this.

1

There are 1 answers

1
Allan Wind On
  1. Your code is incomplete without hash.h. This means that I had to add missing includes, reverse engineer the typedef map_node, hashmap. Make up a value for BINS. Create a placeholder for the function rehash().

  2. (Not fixed) Prefer using a consistent namespace prefix on all symbols that might be exported (types, functions, global variables). hash_ xor hashmap_?

  3. Avoid global variables (BINS). In this case just add that as argument to hash_init().

  4. (Not fixed) hash_init(): returns the hashmap by value. As the return value could be large you are better off returning a hashmap *. This means you need to allocate both a hashmap and hashmap.table.

  5. hash_init(): Prefer calloc() to malloc() + for().

  6. Don't cast void * like from malloc().

  7. Prefer sizeof var to sizeof(type). The former is a purely mechanical transformation, and variable is usually shorter (i.e. no namespace prefix) than the type name. For example:

    map_node *new_element = malloc(sizeof *new_element)
    
  8. Instead of calculating the length of a string with strlen(), particular a loop, use the fact that str[i] is false on terminating \0. So instead of for(int i = 0; i < strlen(s); i++) do for(int i = 0; s[i]; i++).

  9. put(): Either lookup the existing value of a key and call put(..., old_value + 1) with your existing implementation, allow collisions for the same key to be chained via the next pointer and rework hashmap_print() to sum them up per unique keys, or update the value for same key. I fixed put() to achieve the latter here as that seems to be the approach you were using.

  10. put(): If you rehash() then index is no longer valid for the subsequent insert. Either do it first or last, or call hash() after rehash() if you really want to do it in the the middle.

  11. to_str(): sprintf(rv, ..., rv, ...) is undefined behavior as you just print it anyways changed it to hashmap_print().

  12. main(): check argc before you deference argv.

  13. main(): put() expects a string but argv[1][i] is a char. It's undefined behavior to pass a non-\0 terminated string to strlen(). Construct a string, for example, using an initializer.

  14. (Not fixed) Make sure you free all memory allocated. If nothing else so you have a clean slate for when you need to look for leaks.

#define _POSIX_C_SOURCE 200809L
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct map_node {
    char *key;
    int value;
    struct map_node *next;
} map_node;

typedef struct {
    map_node **table;
    int size;
    int bins;
} hashmap;

int BINS = 13;

hashmap hash_init(void) {
    return (hashmap) { calloc(BINS, sizeof(map_node *)), 0, BINS };
}

static int hash(char *key, int bins) {
    unsigned hashval = 0;
    for (int i = 0; key[i]; i++)
        hashval = 31 * hashval + key[i];
    return hashval % bins;
}

void rehash(hashmap *h) {
    assert("TBD" && 0);
}

int put(hashmap *h, char *key, int value) {
    if (h->size >= h->bins)
        rehash(h);
    int index = hash(key, h->bins);
    for (map_node *iterator = h->table[index]; iterator; iterator = iterator->next)
        if (!strcmp(iterator->key, key)){
            int old_value = iterator->value;
            iterator->value += value;
            return old_value;
        }
    map_node *new_element = malloc(sizeof *new_element);
    new_element->next = h->table[index];
    new_element->key = strdup(key);
    new_element->value = value;
    h->table[index] = new_element;
    h->size++;
    return 0;
}

void hashmap_print(hashmap *h){
    for(int i = 0; i < h->bins;i++)
        for (map_node *it = h->table[i]; it; it = it->next)
            printf("%s: %d\n", it->key, it->value);
}

int main(int argc, char* argv[]) {
    if(argc < 2) {
        printf("usage: program keys\n");
        return 1;
    }
    hashmap map = hash_init();
    for (int i = 0; argv[1][i]; i++)
        put(&map, (char []) { argv[1][i], '\0' }, 1);
    hashmap_print(&map);
}

and example run:

$ ./a.out abcba
a: 2
b: 2
c: 1