Hashing
Definitions:
Hashing: The process of calculating a unique index into
an array of structures using the key associated with the data in a record.
Hash Table: An array of data structures or a large number of
records stored in a file on disk. Each structure or record has associated with
it a unique index in the array or file and a key.
Hash Function: The technique used to calculate a unique index
into the hash table from a key. This may be a mathematical calculation based on
some value of the key or it may be some data manipulation technique.
Collision: When two different keys hash to the same index in
the hash table.
Collision Resolution Techniques: Ways of handling hashing collisions.
These include:
-
Open Addressing with linear probing -- when a collision occurs the calculated hash
index is incremented by one.
-
Open Addressing with double hashing -- when a collision occurs the original key
is used to calculated a unique value (the double hash) to be used to increment
the hash index. Even if two keys hash to the same index it is unlikely they
will also double hash to the same increment value.
-
Chaining -- when collisions occur a linked list of all records is hashing to
the same index are created from the first record in that index.
-
Buckets -- the hash table is divided up into equal sized groups of locations
(buckets) and hash indices are calculated for the number of buckets, thus each
index in the hash table can hold several records.
Mapping: A term referring to how the hash function generates
or "maps" a key to an index.
Perfect (ideal) hash function: A hash function that maps keys
uniformly and randomly into the table without collisions. This ideal is impossible
to reach in reality.
Load factor: The ratio of the number of filled entries (N) in a
hash table to the number of entries in the table (M): α = N / M. If you have
a table that has 500 locations and 250 of them are filled the table has a load
factor of 250 / 500 = 0.5
Cluster: A series of adjacent filled entries in a hash table.
Examples of Hash Functions:
-
Base-26
Treat each letter in the key as if it was a digit in a base-26 numbering system.
Thus a key of char key[] = "LMNO"; would be hashed using the formula:
long index = ((key[0]-'A')*pow(26,3)) + ((key[1]-'A')*pow(26,2)) + ((key[2]-'A')*26) +
((key[3]-'A')) % TABLESIZE;
-
Folding
Treat each letter in the key as if it were a one-bit integer. Multiply the first
half of the key letters together then add this to the product of the last half of
the key letters. For example: ('L' * 'M') + ('N' * 'O'). The code for this would
be
long index = ((key[0] * key[1]) + (key[2] * key[3])) % TABLESIZE.
-
Middle Squaring
Treat each letter in the key as if it were a one-bit integer. Take half of the
key characters from the center of the key. Add these together then square the
results. For example: ('M' + 'N')2. The code for this would
be
long index = pow((key[1] + key[2]), 2) % TABLESIZE.
-
Truncation
Treat each letter in the key as if it were a one-bit integer. Simply delete part
of the key and multiply the remaining characters together.
For example: ('L' * 'M'). The code for this would be
long index = (key[0] * key[1]) % TABLESIZE.
-
Division Method
- Select a table size that is a prime number, but not a Fermat prime.
- Translate the key into an integer using one of the above techniques.
- Divide the numberic key by the table size to give a quotient and a remainder.
- Use the remainder as the hash index.
- Use max(1, quotient) as the probe increment (double hash).