badd10de.dev

Book: The Art of Computer Programming, by Donald Knuth


Hashing (Vol. 3, p. 519)

If we have a stream of characters that we want to hash, we can hash each character with a different hash function and add them together.

Be K := {x1, x2, ... , xn} we want to map to M buckets:

h(K) = (h1(x1) + h2(x2) + ... + hn(xn)) mod M

The thing that makes this interesting is that for a limited set of numbers, we can precompute these hashes and store them in a lookup table, so the function becomes:

hash_lookup[255] = {0xDEADBEEF, 0xFEEDD011, ...}
h(K) = (hash_lookup[x1] + hash_lookup[x2] + ... + hash_lookup[xn]) mod M

If M is a power of 2, we can avoid the mod by subsituting XOR in place of addition:

h(K) = (hash_lookup[x1] ^ hash_lookup[x2] ^ ... ^ hash_lookup[xn]) & M

If the hash_lookup values are chosen at random, this method should minimize the collisions in the table.

Alternatively, we could just use Fibonacci hashing as a finisher for mapping the results of the additions. For a hash table, we could generate the lookup table of 255 elements at random every time.