Hashing


Definitions:

Hashing: The process of calculating a unique index into an array of structures using the key associated with the data in a record.

Hash Table: An array of data structures or a large number of records stored in a file on disk. Each structure or record has associated with it a unique index in the array or file and a key.

Hash Function: The technique used to calculate a unique index into the hash table from a key. This may be a mathematical calculation based on some value of the key or it may be some data manipulation technique.

Collision: When two different keys hash to the same index in the hash table.

Collision Resolution Techniques: Ways of handling hashing collisions. These include:
  1. Open Addressing with linear probing -- when a collision occurs the calculated hash index is incremented by one.
  2. Open Addressing with double hashing -- when a collision occurs the original key is used to calculated a unique value (the double hash) to be used to increment the hash index. Even if two keys hash to the same index it is unlikely they will also double hash to the same increment value.
  3. Chaining -- when collisions occur a linked list of all records is hashing to the same index are created from the first record in that index.
  4. Buckets -- the hash table is divided up into equal sized groups of locations (buckets) and hash indices are calculated for the number of buckets, thus each index in the hash table can hold several records.

Mapping: A term referring to how the hash function generates or "maps" a key to an index.

Perfect (ideal) hash function: A hash function that maps keys uniformly and randomly into the table without collisions. This ideal is impossible to reach in reality.

Load factor: The ratio of the number of filled entries (N) in a hash table to the number of entries in the table (M): α = N / M. If you have a table that has 500 locations and 250 of them are filled the table has a load factor of 250 / 500 = 0.5

Cluster: A series of adjacent filled entries in a hash table.


Examples of Hash Functions:

  1. Base-26
    Treat each letter in the key as if it was a digit in a base-26 numbering system. Thus a key of char key[] = "LMNO"; would be hashed using the formula:
    long index = ((key[0]-'A')*pow(26,3)) + ((key[1]-'A')*pow(26,2)) + ((key[2]-'A')*26) + ((key[3]-'A')) % TABLESIZE;
  2. Folding
    Treat each letter in the key as if it were a one-bit integer. Multiply the first half of the key letters together then add this to the product of the last half of the key letters. For example: ('L' * 'M') + ('N' * 'O'). The code for this would be
    long index = ((key[0] * key[1]) + (key[2] * key[3])) % TABLESIZE.
  3. Middle Squaring
    Treat each letter in the key as if it were a one-bit integer. Take half of the key characters from the center of the key. Add these together then square the results. For example: ('M' + 'N')2. The code for this would be
    long index = pow((key[1] + key[2]), 2) % TABLESIZE.
  4. Truncation
    Treat each letter in the key as if it were a one-bit integer. Simply delete part of the key and multiply the remaining characters together. For example: ('L' * 'M'). The code for this would be
    long index = (key[0] * key[1]) % TABLESIZE.
  5. Division Method
    1. Select a table size that is a prime number, but not a Fermat prime.
    2. Translate the key into an integer using one of the above techniques.
    3. Divide the numberic key by the table size to give a quotient and a remainder.
    4. Use the remainder as the hash index.
    5. Use max(1, quotient) as the probe increment (double hash).