Basic principles of hashCode, hash() functions, and hash collisions
Hash Principle
1. hash functions
Requirement: a. efficiently computable b. Each table index equally likely for each key
hashCode() :
if x.equals(y),then (x.hashCode() == y.hashCode())
*Hash code: * An
int
between -2^31 and 2^31-1*Hash function: * An
int
between 0 and M-1 (for use as array index)private int hash(Key key) {return (key.hashCode() & 0*7fffffff) % M}
2. Collisions
Collisions: Two distinct keys hashing to same index
Separate chaining:
Summary : Hash table with separate chaining cost constant time in search, insert and delete. As a result it is preferred for short keys where the hash function is easy to compute
Linear probing hash table:
Hash. Map key to integer i between 0 and M-1
Insert. Put table index i if free; if not try i+1, i+2, etc.
Search. Search table index i; if occupied but no match, try i+1, i+2, etc.
Note. Array size M must be greater than number of key-value pairs N.
Parameters.
- M too large => too many empty array entries (waste memory)
M too small => search time blows up
Typical choice: a = N/M ~ 1/2
probes for search hit is about 3/2
probes for search miss is about 5/2
3.Context
1. Hash tables
- Simple to code.
- No effective alternative for unordered keys.
- Faster for simple keys (a few arithmetic ops versus logN compares)
- Better system support in Java for strings (e.g., cached hash code)
2.Balanced search trees
- Stronger performance guarantee.
- Support for ordered ST operations.
- Easier to implement compareTo() correctly than equals() and hashCode()
3. Java system includes both
Red-black BSTs: java.util.TreeMap, java.util.TreeSet
Hash tables: java.util.HashMap, java.util.IdentityHashMap.
4. Simbol Table Application
1. Sets
- Usage: Read in a list of words from one file, print out all words from standard input that are {in, not in} the list.