Hash Tables Flashcards

1
Q

Hashing

A
  • A technique used for performing insertions, deletions, and searches on constant average time
  • Tree operations that require any ordering information among the elements are not supported efficiently
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Hashing Data Structure

A

Hash Table (generalization of ordinary arrays) is used to implement hashing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Hash Table Components

A

Key and entry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to find an entry?

A
  • To find an entry in the table, you only need to know the content of one of the fields (not all of them). This field is called key
  • Ideally, a key uniquely identifies an entry
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Direct Adress table

A

-Direct-address tables are ordinary arrays

  • Facilitate direct addressing
  • Element whose key is k is obtained by indexing into the kth position of the array
  • It is applicable when we can afford to allocate an array with one position for every possible key
  • Insert/Delete/Search can be implemented to take O(1) time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Disadvantage of direct-address table

A

The disadvantage is that we could end up with a big array that has a lot of empty space. This wastes lots of memory. The bigger we scale, the worse this problem gets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Cons direct-address table

A
  • If U is large, then a direct-address table T of size |U| is impractical or impossible (considering memory)
  • ->The range of keys determines the size of T

-The set K could be so small relative to U that most of the allocated space is wasted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Direct-Address table vs. Hash Table

A

-Direct-address table
o Element with key k is stored in slot k

-Hash table
o Use hash function h(k) to compute the slot where k will be inserted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Hash table big idea

A
  • We can make the array smaller, and use the hash function to select which slot
    we put our values into. For example the hash function = modulus function
  • Basically, we are compressing the domain of our keys so that they fit into the array
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Hash table size

A

-The size of the hash table is proportional to |K|
o We lose the direct-addressing ability
o We need to define functions that map keys to slots of the hash table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Hash function definition

A

-A hash function h transforms a key k into a number that is used as an index in an array to locate the desired location (“bucket” or “slot”)

o An element with key k hashes to slot h(k) 11.2 Hash tables
o h(k) is the hash value of key k
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Hash function pros

A

oSimple to compute
oAny two distinct keys get different cells
oThis is impossible, thus we seek a hash function that distributes the keys evenly among the
cells

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Hash table components

A
  • The ideal hash table is an array of some fixed size m, containing items (key and additional entry)
  • Each key k is mapped (using a hash function) into some number in the range 0 to m – 1 and places in the appropriate cell
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Hash table big O for operations

A

-Insertion of a new entry is O(1)

-Lookup for data is O(1) on average
oStatistically unlikely that O(n) will occur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Issues with hashing

A
  • Decide what to do when two keys hash to the same value (collision)
  • Choose a good hash function
  • Choose the size of the table
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Hashing Division method

A
-Maps key k into one of m slots by taking the remainder of k divided by m 
o h(k) = k % m
17
Q

Collision

A

-A hash function maps most of the keys onto unique integers, but maps some
other keys onto the same integer

  • Multiple keys can hash to the same slot: collisions are unavoidable
  • Design a good hash function will minimize the number of collisions that occur, but they will still happen, so we need to be able to deal with them
18
Q

Basic collision resolution policies

A
  • –Chaining
  • > Store all elements that hash to the same slot in a linked list
  • > Store a pointer to the head of the linked list in the hash table slot
  • –Open Addressing
  • > All elements stored in hash table itself
  • > When collision occur, use a systematic procedure to store elements in free slots of the table
19
Q

Basic collision resolution policies

A
  • –Chaining
  • > Store all elements that hash to the same slot in a linked list
  • > Store a pointer to the head of the linked list in the hash table slot
  • –Open Addressing (linear, quadratic, double hashing)
  • > All elements stored in hash table itself
  • > When collision occur, use a systematic procedure to store elements in free slots of the table (linear, quadratic, double hashing)
20
Q

Chaining Insert Pseudocode and running time

A

CHAINED-HASH-INSERT(T,x)
insert x at the head of list T[h(x.key)]

Worst case: O(1)

21
Q

Chaining search pseudocode and running time

A

CHAINED-HASHSEARCH(T,k)
search for an element with key k in list T[h(k)]

Worst case: proportional to the length of the list

22
Q

Chaining delete pseudocode and running time

A

CHAINED-HASH-DELETE(T,x)
delete x from the list T[h(x.key)]

Worst-case running time for deletion is O(1) if list is doubly linked

23
Q

When is chaining used?

A
  • Chaining is used when we cannot predict the number of records in advance, thus the choice of the size of the table depends on available memory
  • Typically it’s chosen relatively small, so no large area of contiguous memory is used; but large enough so that the lists are short for more efficient search
24
Q

Open Addressing

A
  • Idea: Store all elements in the hash table itself. That is, each slot contains either an element or NIL
  • Advantages: Avoids pointers
25
Q

Open Addressing methods

A
  • linear probing
  • quadratic probing
  • double hashing
26
Q

Open Addressing linear probing hash function

A

If collision occurs, next probes (slots examined during a key insertion or searching) are performed following the formula:

-> h(k)=(hash(k)+ f (i)) mod m

o h(k) is the index in the table indicating where to insert k
o hash(k) is the auxiliary hash function
o f(i) is the collision resolution function with i = 0 ... m – 1 
o m the size of the hash table
27
Q

Open Addressing linear probing hash function

A

If collision occurs, next probes (slots examined during a key insertion or searching) are performed following the formula:

-> h(k)=(hash(k)+ f (i)) mod m

o h(k) is the index in the table indicating where to insert k
o hash(k) is the auxiliary hash function
o f(i) is the collision resolution function with i = 0 ... m – 1 
o m the size of the hash table
28
Q

Open addressing insert pseudocode

A

§ Insert: Examine slot h(k)
o If slot is empty, insert element
o If slot is full, try another slot, …, until an open slot is found (probing)
ü Wrapping around to the beginning if we hit the end of the table

Insert(T,k) 
I←0
repeat
     j ← h(k,i)
     if T[j] = NIL then
          T[j] ← k
          return j 
     else 
          i←i+1 
until i = m

error “hash table overflow”

29
Q

Open addressing search pseudocode

A

§ Search: Examine slot h(k)

o If slot h(k) contains key k, search successful
o If the slot h(k) contains NIL, search unsuccessful
o If slot h(k) contains a key that is not k, keep probing (by following the same sequence of probes when inserting the element) until we either find key k or we find a slot holding NIL

Search(T,k) 
i←0 
repeat
     j ← h(k,i)
     if T[j] = k then
            return j
      i←i+1
until T[j] = NIL or i = m
return NIL
30
Q

Open addressing deletion

A

-If we delete a key from slot h(k) we cannot mark the slot NIL.
Why?
o We need to:
-> Mark the slot with a special value
->Modify SEARCH so it keeps on looking when it sees the special value
-> INSERT would treat the slot with a special value as empty slot, so a new value can be added

31
Q

Quadratic probing

A

§ Similar to linear probing
§ … but instead of using a linear function to advance in the table in search of an empty slot, we use a quadratic function to compute the next index in the table to be probed
h(k)=(hash(k)+i^2) mod m
i = 0…m−1

§ The idea is to skip regions in the table with possible clusters

32
Q

Quadratic probing disadvantage

A

Secondary Clustering

33
Q

Linear probing disadvantage

A

Primary Clustering
§ We can see clusters form (blocks of adjacent buckets all occupied)
§ Thus increasing the time it takes to do insert and search for any key that hashes to a buckets in that cluster

34
Q

Double Hashing

A
  • Purpose same as in quadratic probing: to overcome the disadvantage of clustering
  • Instead of examining each successive entry following a collided position, we use a second hash function to get a fixed increment for the “probe” sequence

§ hash1 gives the initial probe. hash2 gives the remaining probes

§ There are a couple of requirements for the second function:
o It must never evaluate to 0
o Must make sure that all buckets can be probed

h(k)=(hash1(k)+i * hash2 (k)) mod m
i = 0…m−1

35
Q

Popular second hash function

A

hash2 (k) = R − (k mod R) , where R is a prime number that is smaller than the size of the table