Hash Principle

Basic principles of hashCode, hash() functions, and hash collisions

Hash Principle

1. hash functions

  1. Requirement: a. efficiently computable b. Each table index equally likely for each key

  2. hashCode() :

    if x.equals(y),then (x.hashCode() == y.hashCode())
  1. *Hash code: * An int between -2^31 and 2^31-1

    *Hash function: * An int between 0 and M-1 (for use as array index)

    private int hash(Key key) 
      {return (key.hashCode() & 0*7fffffff) % M}

2. Collisions

  1. Collisions: Two distinct keys hashing to same index

  2. Separate chaining:

Summary : Hash table with separate chaining cost constant time in search, insert and delete. As a result it is preferred for short keys where the hash function is easy to compute

  1. Linear probing hash table:

    Hash. Map key to integer i between 0 and M-1

    Insert. Put table index i if free; if not try i+1, i+2, etc.

    Search. Search table index i; if occupied but no match, try i+1, i+2, etc.

    Note. Array size M must be greater than number of key-value pairs N.

  2. Parameters.

    • M too large => too many empty array entries (waste memory)
    • M too small => search time blows up

    • Typical choice: a = N/M ~ 1/2

      probes for search hit is about 3/2

      probes for search miss is about 5/2

3.Context

1. Hash tables

  • Simple to code.
  • No effective alternative for unordered keys.
  • Faster for simple keys (a few arithmetic ops versus logN compares)
  • Better system support in Java for strings (e.g., cached hash code)

2.Balanced search trees

  • Stronger performance guarantee.
  • Support for ordered ST operations.
  • Easier to implement compareTo() correctly than equals() and hashCode()

3. Java system includes both

  • Red-black BSTs: java.util.TreeMap, java.util.TreeSet

  • Hash tables: java.util.HashMap, java.util.IdentityHashMap.

4. Simbol Table Application

1. Sets

  • Usage: Read in a list of words from one file, print out all words from standard input that are {in, not in} the list.