Skip to content

Hash Function

A hash function is any function that can be used to map data of arbitrary size onto data of a fixed size.

Hash Functions

1. DJB2 :link:

this algorithm (k=33) was first reported by dan bernstein many years ago in comp.lang.c. another version of this algorithm (now favored by bernstein) uses xor: hash(i) = hash(i-1) * 33 ^ str[i]; the magic of number 33 (why it works better than many other constants, prime or not) has never adequately explained

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
unsigned long long djb2(char *str) {
    unsigned long long hash = 5381;
    int c;

    while ((c = *(str++))) {
        hash = (hash << 5) + hash + c;
    }

    return hash;
}

2. sdbm :link:

this algorithm was created for sdbm (a public-domain reimplementation of ndbm) database library. it was found to do well in scrambling bits, causing better distribution of the keys and fewer splits. it also happens to be a good general hashing function with good distribution. the actual function is hash(i) = hash(i - 1) * 65599 + str[i];; what is included below is faster version used in gawk. (there iseven a faster, duff's device version) the magic constant 65599 was picked out of thin air while experimenting with different constants, and turns out to be a prime. this is one of the algorithms used in berkeley db (see sleepy cat) and else where

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
unsigned long long sdbm(char *str) {
    unsigned long long hash = 5381;
    int c;

    while ((c = *(str++))) {
        hash = c + (hash << 6) + (hash << 16) - hash;
    }

    return hash;
}

3. lose lose :link:

This hash function appeared in K&R (1st ed) but at least the reader was warned: "This is not the best possible algorithm, but it has the merit of extreme simplicity". This is an understatement; It is a terrible hashing algorithm, and it could have been much better without scarificing its "extreme simplicity." Many C programmers use this function without actually testing it, or checking something like Knuth's Sorting and searching, so it stuck. It is now found mixed with other respectable code, eg.cnews.

Warning

Don't use this algorithm, it's terrible.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
unsigned long long loseLose(char *str) {
    unsigned long long hash = 0;
    int c;

    while ((c = *(str++))) {
        hash += c;
    }

    return hash;
}

Comments