DSA - Hash Tables Flashcards

1
Q

What is a Hash Function?.

A

hash(key) computes the location from a KEY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Does it matter what Data Type is when we use the hash function?

A

NO

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How Does a Hash Table Work?

A

Take the key calculate some number without referencing the key or record, use the number for a index in the array.

Get Us Immediately in 1 step where we want to be

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Where do you store KEY?

A

Store it in Linked List/ Tree/ Hash Table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe assignments in Hash Table ?

A

-It is O(1) time complexity
- Very Memory Inefficient
==> Need an Array of size 10^7, if we only use 500 spaces it
still keeps the rest thus very inefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define Hash/Data Collison:

A

2 Different Keys with same number of finding remainder to place in that index

  • Increasing array length may cause an decrease in the
    likeness of collisions. (Does Not Fully Remove Them)
  • Require a robust & reliable solution to resolve collisions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What Does Hash Table Consists of:

A
  • Array arr for storing the values
  • hash function hash(KEY) => turns Key into index of array
  • Mechanism for dealing with the collisions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the 2 Types of solutions of hash Collisions?

A

-Chaining Strategy
- Open Addressing Strategy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does Chaining Strategy do?

A

Use another data structure on each position of the hash tables
-Only Chaining Strategy is Direct Chaining
- Use Linked List to store the values with the
same hash(code)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does Addressing Strategy do?
GIVE THE 2 EXAMPLES ?

A

Finding a different address in the same array for the colliding value to be stored
(ADDRESS MUST BE UNUSED or OPEN)
- Two Strategy are Linear Probing & Double Hashing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to Insert with Direct Chaining?

A
  • Always check if the key which we are inserting is in the LL on position hash(key)
  • If isn’t insert key at BEG of list

// DOES NOT NEED TO INSERT KEY IN BEG, BUT IN PRATICE IT IS IDEAL AS THE KEY IS MORE LIKELY TO BE ACCESSED SOON

TAKE O(n) = need to go through all elements already stored in hash table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to Delete with Direct Chaining?

A

To delete(key) we delete key from LL stored in Position hash(key)

Takes O(n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to do Lookup in Direct Chaining?

A

lookup(key) return TRUE/FALSE, depending if key is stored in the list on position hash(key)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a Bad Hash Function?

A

-All entries have the same hash value
- 1 LARGE Linked List (LL)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is a Bad Hash Function BAD?

A

Has LARGE linear search through those links

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What makes a GOOD Hash Function?

A
  • Uniformly distributes keys among positions
  • Given a Random key it has to have the same probability of being stored on every position
  • location given by hash has the expected number of entries stored there equal to n/T.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the Load Factor?

A

AVG. NO. of entries stored on a Location
n/T
n = total number of store entries
T = size of hash table

REPRESENTS HOW FULL THE HASH TABLE IS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is Unsuccessful lookup of a Key?

A
  • Key is not in the table
  • Location hash(key) stores n/T entries on AVG.
    (Have TRAVERSE them all, LINEAR SEARCH on
    n/T entries )
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a Successful Lookup for a key?

A
  • Location hash(key) stores k = n/t entries on AVG
  • AVG. a linear search in LL of k elements takes
    1/k(1+2+…+k) = k(k+1)/2k = (k+1)/2 comparisons
  • ASSUME MAX load Factor λ that is, n/T <= λ
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is AVG case time complexity of unsuccessful lookup?

A

n/T <= λ comparisons O(1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is AVG case time complexity of successful lookup?

A

1/2(1+n/T) <= 1/2(1+λ) Comparisons O(1)

λ is constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Time Complexity of insert(key)?

A
  • Same as unsuccessful lookup
    O(1)
  • First Check if the key is stored in the table
  • Otherwise insert key at BEG of list stored on hash(key)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Time Complexity of delete(key)?

A

Same as Successful Lookup
O(1)

24
Q

Time Complexity of Insert?

A

O(1)

25
Q

What 2 Assumptions are made for Direct Chaining?

A
  1. We have a good Hash function
    • Expected Length of chains is n/T
  2. We assume a maximal load Factor
    • n/T < λ, constant λ (MAKE SURE LOAD FACTO IT IS BOUNDED BY SOME λ)

Assume the 2 Conditions , computed that the operations of Hash Table are ALL O(1).

Good Hash Function Depends on Distribution of data

  • CONSTANT TIME COMPLEXITY WILL BE AMORTIZED ONLY
26
Q

Disadvantage of Chaining Strategies?

A

1) Lot of hash collisions, therefore lots of unused space
- Good Performance maintain a low Load Factor
{Between 0.25-0.75}
- At 0.25 you are using an array 4x the entries you
are going to store in hash table
(A LOT OF UNUSED SPACE)

2) LL require a lot of allocation (ALLOCATE MEMORY), which is slow
- Every Insert you need to allocate Memory
- Not only using space in the array, but allocating memory is a heavy method to call.

27
Q

What is the Idea of Insertion in LINEAR PROBING?

A

If primary position hash(key) is occupied, search for 1st available position to the right of it.

REACH END WRAP AROUND

28
Q

How to compute the Fallback Positions For LINEAR PROBING?

A
  • Increment the value of hash(key) by 1 MOD the value by size of the array. Repeat until you find FREE SPACE
  • E.g => hash(key) + 1 mod T
    T = size of the array
29
Q

Idea of Deleting using Linear Probing?

A

1- Find the key stored in table:
- Start from primary position hash(key) go right until key or empty position is found.

2- Key is stored in the table, replace it with a marker value (TOMBSTONE = #)

DO Probing until you find # or empty space and replace it with new value

30
Q

Issue With Deleting in LINEAR PROBING?

A

Delete Data Collision value, the one stored in the correct hash position, when searching for element in that hash position will return false as nothing is there for key value.

Untrue as we had to move the data collision to next free space. This leads to error as we don’t find key value from the data collision

31
Q

Linear Probing Searching?

A
  • Start from Primary Position hash(key)
  • Search for key to the right.
  • Skip over # TOMBSTONES
  • If we reach Empty Position, key is not in table
32
Q

Linear Probing Inserting
MORE ACCURATE

A
  • Search for hash(key) like Searching
  • NOTE first Tombstone Position we find
  • Find the Key signal ERROR
  • Reach Empty position, key not in table
  • Insert key into 1st Tombstone, if any
  • Otherwise, insert into Empty position found

CAN’T INSERT TILL WE KNOW THAT THE KEY IS NOT ALREADY IN THE HASH TABLE.

33
Q

Time Complexity of Linear Probing?

A

Insert , Search & Delete ARE ALL O(1)

34
Q

Primary Cluster in LP ?

A

Clusters caused by entries with the same hash code.

35
Q

Secondary Clusters in LP?

A

When the Collision handling strategy causes different entries to check the same sequence of locations when they collide

36
Q

LP Problem with Clustering?

A

Clusters are more likely to get bigger and bigger, even if small load factor,

Lots of Clusters means more probing, thus longer time taken

USE DOUBLE HASHING TO MAKE CLUSTERING LESS LIKELY TO OCCUR

37
Q

What is Double Hashing?

A

Use a Primary & Secondary hash function

Primary Hash: Landing place in our Array(Hash Value we go to in the array)
- SPACE IS EMPTY OR # CAN USE IT

Secondary Hash: add this result to primary key as an offset to find free space, repeat till we find free space.
REDUCE PROBLEMS WITH SECONDARY CLUSTERS, SO FEWER COLLISIONS

Different key values the offset will be different

38
Q

Difference Between Linear Probing and Double Hashing?

A
  • Difference is that for different key values in DOUBLE HASHING the offset will also be different for every key value, due to secondary hash function

ALL OPERATION WORK THE SAME WAY FOR BOTH

39
Q

Insertion Double Hashing?

A

Try Primary Position hash1(key) first and, if it fails, we try fallback positions:

1) hash1(key) + 1hash2(key) MOD T
2) hash1(key) + 2
hash2(key) MOD T
3) hash1(key) + 3*hash2(key) MOD T
REPEAT TILL WE FIND EMPTY SPACE

T = size of the ARRAY

40
Q

Linear Probing Fallback Calculation?

A

(hash(key) + i) mod T
i=1,2,3…

41
Q

Double Hashing Fallback Calculation?

A

(hash1(key) + i*hash2(key)) mod T
i = 1,2,3,…

42
Q

Problem with Double Hashing?

A

Can have Short Cycles

43
Q

What are short cycles in Double Hashing?

A

Short cycle is a loop that occurs where after a few probes you get back to already visited block

CAN’T FIT ANYMORE VALUES, MAYBE Spaces in odd Index Locations

44
Q

How to Fix Short Cycles?

A

Table Size T & hash2(key) are COPRIME

2 SOLUTIONS:
1) T is PRIME
- Array is Prime, Won’t get common Divisor
- But, Difficult when growing array
2) T= 2^k and hash2(key) is ODD
- Test if hash2(key) is ODD, if EVEN +1
- Just Setting Least Significant Bit so it is ODD
- Works as no Common divisor between hash2 & T

45
Q

What is the problems with Large Common Divisor?

A

Worse it gets for missing values

46
Q

What does Coprime mean?

A

If no number other than 1, divides both a & b
Prime numbers are the numbers which are divisible only by 1 and themselves

47
Q

For 2nd Solution for short cycles, why is the array of size 2^k?

A

-2^k therefore doubles it
- Less fragmentation, however make sure Hash key values don’t crash with Array Length

DOUBLING THE SIZE OF THE ARRAY WHEN IT GETS FULL ENSURES T + 2^k

48
Q

What happens when table is Full?

A

Hash Table is full : n/T > λ
If n/T is larger than λ does not meant it is not a full table

49
Q

What does n/T > λ mean?

A

If True allocate a new table twice the size and allocate values into new table

50
Q

The Idea of Rehashing?

A

Table becomes full after an insertion, allocate new table twice the size and INSERT all elements from old table to it

CAN CACHE THE HASH FUNCTION VALUES, SO QUICKER TO PLACE INTO NEWLY ALLOCATED ARRAY

51
Q

Consequence of Insert?

A

The worst case time Complexity is O(n), when REHASHING

AMORTIZED TIME COMPLEXITY is O(1)

52
Q

What do we need to do with the hash function when doubling array size?

A

Need to change hash function

Usually:
-bigHash(key) mod T
(bigHash COMPUTES a Big Hashcode )

AFTER DOUBLE SIZE:
bigHash(key) mod 2*T

53
Q

What is a Hash Table?

A

a ADTs with implementation consisting of a array arr, a primary function hash1(key) {maybe hash2(key)}

54
Q

Comparing Hash Table to Trees?

A

AVL:
- Require keys to be comparable and the operations are in O(logn), BEST,AVG & WORST CASE

HASH:
- Require good Hash Functions, so operations are O(1) AMORTIZED time complexity FOR ALL OPERATIONS

55
Q

Double Hashing Pros over the other 2 Hash Table Methods?

A

Better Performance advantage:
- Clustering in LP is Worse than DH
- DC needs to allocate memory which has a large constant cost

THIS PROS IS IN MULTIPLIER NOT ENTIRE COMPLEXITY

56
Q

What happens if load factor drops below min threshold?

A

RARELY Done as it doesn’t speed up performance

Rehash the into hash table 1/2 size.

SAVE MEMORY

57
Q

What happens if the Number of Tombstone values exceed a threshold value?

A

Rehash without doubling the size
- May get a Hash table 1/2 the size

  • CONSEQUENTLY delete is O(1) Amortized time complexity