A hash table maintains two arrays, one for keys, one for values (or possibly one array of (key, value) pairs - it doesn't really matter). The elements of these arrays are referred to as buckets[?]. When required to find the associated value for a given key, the key is fed through a hash function to yield an integer (called the hash value). This integer is then the index to the associated value.
Hash tables offer insertions, lookups, and deletions in O(1) amortized time, with the caveat that hash collisions need to be dealt with. A collision is when two different keys hash to the same integer (they have the same hash value). Techniques for dealing with collisions include probing, chaining, and rehashing. Note that probing and chaining are generally mutually exclusive.
Probing is, upon finding a collision, moving to the next free bucket in the table (or incrementing by some number other than 1). Chaining involves using each bucket as a pointer to another structure, such as an array, a linked list, or even another hash (preferably with a different size and/or hashing function).
Both probing and chaining can be problematic because, as more and more elements are added to the hash, the O(1) property is lost! Rehashing is a technique to minimize this effect: by increasing the size of the table and recomputing the hash values with respect to the new table size, the O(1) property can be maintained. Hash tables work faster the fewer occupied entries they have because there will be fewer collisions. If the ratio of occupied entries to total entries is kept below some fixed constant then the performance of hash tables will be reasonable.
Widely useful, hash tables are found in a wide variety of programs.
See also:
Search Encyclopedia
|
Featured Article
|