Hash Tables Flashcards

Question 1

Q

Hashing

Answer

A

A technique used for performing insertions, deletions, and searches on constant average time
Tree operations that require any ordering information among the elements are not supported efficiently

Question 2

Q

Hashing Data Structure

Answer

A

Hash Table (generalization of ordinary arrays) is used to implement hashing.

Question 3

Q

Hash Table Components

Answer

A

Key and entry

Question 4

Q

How to find an entry?

Answer

A

To find an entry in the table, you only need to know the content of one of the fields (not all of them). This field is called key
Ideally, a key uniquely identifies an entry

Question 5

Q

Direct Adress table

Answer

A

-Direct-address tables are ordinary arrays

Facilitate direct addressing
Element whose key is k is obtained by indexing into the kth position of the array
It is applicable when we can afford to allocate an array with one position for every possible key
Insert/Delete/Search can be implemented to take O(1) time

Question 6

Q

Disadvantage of direct-address table

Answer

A

The disadvantage is that we could end up with a big array that has a lot of empty space. This wastes lots of memory. The bigger we scale, the worse this problem gets.

Question 7

Q

Cons direct-address table

Answer

A

If U is large, then a direct-address table T of size |U| is impractical or impossible (considering memory)
->The range of keys determines the size of T

-The set K could be so small relative to U that most of the allocated space is wasted

Question 8

Q

Direct-Address table vs. Hash Table

Answer

A

-Direct-address table
o Element with key k is stored in slot k

-Hash table
o Use hash function h(k) to compute the slot where k will be inserted

Question 9

Q

Hash table big idea

Answer

A

We can make the array smaller, and use the hash function to select which slot
we put our values into. For example the hash function = modulus function
Basically, we are compressing the domain of our keys so that they fit into the array

Question 10

Q

Hash table size

Answer

A

-The size of the hash table is proportional to |K|
o We lose the direct-addressing ability
o We need to define functions that map keys to slots of the hash table

Question 11

Q

Hash function definition

Answer

A

-A hash function h transforms a key k into a number that is used as an index in an array to locate the desired location (“bucket” or “slot”)

o An element with key k hashes to slot h(k) 11.2 Hash tables
o h(k) is the hash value of key k

Question 12

Q

Hash function pros

Answer

A

oSimple to compute
oAny two distinct keys get different cells
oThis is impossible, thus we seek a hash function that distributes the keys evenly among the
cells

Question 13

Q

Hash table components

Answer

A

The ideal hash table is an array of some fixed size m, containing items (key and additional entry)
Each key k is mapped (using a hash function) into some number in the range 0 to m – 1 and places in the appropriate cell

Question 14

Q

Hash table big O for operations

Answer

A

-Insertion of a new entry is O(1)

-Lookup for data is O(1) on average
oStatistically unlikely that O(n) will occur

Question 15

Q

Issues with hashing

Answer

A

Decide what to do when two keys hash to the same value (collision)
Choose a good hash function
Choose the size of the table

Question 16

Q

Hashing Division method

Answer

A

-Maps key k into one of m slots by taking the remainder of k divided by m 
o h(k) = k % m

Question 17

Q

Collision

Answer

A

-A hash function maps most of the keys onto unique integers, but maps some
other keys onto the same integer

Multiple keys can hash to the same slot: collisions are unavoidable
Design a good hash function will minimize the number of collisions that occur, but they will still happen, so we need to be able to deal with them

Question 18

Q

Basic collision resolution policies

Answer

A

–Chaining
> Store all elements that hash to the same slot in a linked list
> Store a pointer to the head of the linked list in the hash table slot
–Open Addressing
> All elements stored in hash table itself
> When collision occur, use a systematic procedure to store elements in free slots of the table

Question 19

Q

Basic collision resolution policies

Answer

A

–Chaining
> Store all elements that hash to the same slot in a linked list
> Store a pointer to the head of the linked list in the hash table slot
–Open Addressing (linear, quadratic, double hashing)
> All elements stored in hash table itself
> When collision occur, use a systematic procedure to store elements in free slots of the table (linear, quadratic, double hashing)

Question 20

Q

Chaining Insert Pseudocode and running time

Answer

A

CHAINED-HASH-INSERT(T,x)
insert x at the head of list T[h(x.key)]

Worst case: O(1)

Question 21

Q

Chaining search pseudocode and running time

Answer

A

CHAINED-HASHSEARCH(T,k)
search for an element with key k in list T[h(k)]

Worst case: proportional to the length of the list

Question 22

Q

Chaining delete pseudocode and running time

Answer

A

CHAINED-HASH-DELETE(T,x)
delete x from the list T[h(x.key)]

Worst-case running time for deletion is O(1) if list is doubly linked

Question 23

Q

When is chaining used?

Answer

A

Chaining is used when we cannot predict the number of records in advance, thus the choice of the size of the table depends on available memory
Typically it’s chosen relatively small, so no large area of contiguous memory is used; but large enough so that the lists are short for more efficient search

Question 24

Q

Open Addressing

Answer

A

Idea: Store all elements in the hash table itself. That is, each slot contains either an element or NIL
Advantages: Avoids pointers

Question 25

Q

Open Addressing methods

Answer

A

linear probing
quadratic probing
double hashing

Question 26

Q

Open Addressing linear probing hash function

Answer

A

If collision occurs, next probes (slots examined during a key insertion or searching) are performed following the formula:

-> h(k)=(hash(k)+ f (i)) mod m

o h(k) is the index in the table indicating where to insert k
o hash(k) is the auxiliary hash function
o f(i) is the collision resolution function with i = 0 ... m – 1 
o m the size of the hash table

Question 27

Q

Open Addressing linear probing hash function

Answer

A

If collision occurs, next probes (slots examined during a key insertion or searching) are performed following the formula:

-> h(k)=(hash(k)+ f (i)) mod m

o h(k) is the index in the table indicating where to insert k
o hash(k) is the auxiliary hash function
o f(i) is the collision resolution function with i = 0 ... m – 1 
o m the size of the hash table

Question 28

Q

Open addressing insert pseudocode

Answer

A

§ Insert: Examine slot h(k)
o If slot is empty, insert element
o If slot is full, try another slot, …, until an open slot is found (probing)
ü Wrapping around to the beginning if we hit the end of the table

Insert(T,k) 
I←0
repeat
     j ← h(k,i)
     if T[j] = NIL then
          T[j] ← k
          return j 
     else 
          i←i+1 
until i = m

error “hash table overflow”

Question 29

Q

Open addressing search pseudocode

Answer

A

§ Search: Examine slot h(k)

o If slot h(k) contains key k, search successful
o If the slot h(k) contains NIL, search unsuccessful
o If slot h(k) contains a key that is not k, keep probing (by following the same sequence of probes when inserting the element) until we either find key k or we find a slot holding NIL

Search(T,k) 
i←0 
repeat
     j ← h(k,i)
     if T[j] = k then
            return j
      i←i+1
until T[j] = NIL or i = m
return NIL

Question 30

Q

Open addressing deletion

Answer

A

-If we delete a key from slot h(k) we cannot mark the slot NIL.
Why?
o We need to:
-> Mark the slot with a special value
->Modify SEARCH so it keeps on looking when it sees the special value
-> INSERT would treat the slot with a special value as empty slot, so a new value can be added

Question 31

Q

Quadratic probing

Answer

A

§ Similar to linear probing
§ … but instead of using a linear function to advance in the table in search of an empty slot, we use a quadratic function to compute the next index in the table to be probed
h(k)=(hash(k)+i^2) mod m
i = 0…m−1

§ The idea is to skip regions in the table with possible clusters

Question 32

Q

Quadratic probing disadvantage

Answer

A

Secondary Clustering

Question 33

Q

Linear probing disadvantage

Answer

A

Primary Clustering
§ We can see clusters form (blocks of adjacent buckets all occupied)
§ Thus increasing the time it takes to do insert and search for any key that hashes to a buckets in that cluster

Question 34

Q

Double Hashing

Answer

A

Purpose same as in quadratic probing: to overcome the disadvantage of clustering
Instead of examining each successive entry following a collided position, we use a second hash function to get a fixed increment for the “probe” sequence

§ hash1 gives the initial probe. hash2 gives the remaining probes

§ There are a couple of requirements for the second function:
o It must never evaluate to 0
o Must make sure that all buckets can be probed

h(k)=(hash1(k)+i * hash2 (k)) mod m
i = 0…m−1

Question 35

Q

Popular second hash function

Answer

A

hash2 (k) = R − (k mod R) , where R is a prime number that is smaller than the size of the table