HashTableSet Flashcards

1
Q

Map ADT

A

has functions: size(), get(k), put(k,v), remove(k), find(k)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Hash Table (unordered_Map)

A

Size of HT is proportional to the number of keys (not the value of keys). Key is the input/parameter used by a hash function that will generate a hash code (output) that will assign the element into an index and store its key into the data field.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Hash Function

A

Will take in a keyvalue and output the hashcode to generate the index value for insertion into a table/map. Must be fast.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hash Table Concerns

A

Hashcodes are not unique, multiple Hash Keys will generate the same Hash Code and result in collisions. Collisions are the main focus of (maintaining) optimal
runtime of the Hash Table data structure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a possible ADT using Red-Black Tree DS?

A

Red-Black Tree can support a (C+++) Set ADT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a possible ADT using Linked List DS?

A

Linked List can support a (C++) List ADT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a possible ADT using ArrayList DS?

A

ArrayList can support a (C++) Vector ADT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Set ADT

A

has functions: size(), insert(x) [no duplicates], remove(x), find(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Lexicon vs. Numbers

A

Strings as a sequence, numbers are single entity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Red-Black Tree Advantages

A

Faster Insert than AVL: RBT insertion traverses the tree once instead of twice.
While longer than AVL, finding is still guaranteed O(log n )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Red-Black Tree Advantages

A

Slower to Find (than AVL): RBT are (slightly) taller than AVL trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Red-Black Tree Insert Algorithm

A

While not leaf: 1. Move down tree to destination, if node has two red children recolor and rotate as necessary to fix. 2. Insert node 3. Do final rotations to fix tree. (not need to go back up tree.
Runtime: O(log n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Can any node have 2 red children?

A

Black nodes can have 2 red children. inserting a new (red) node into this set will violate RBT rule and thus recoloring (red becomes black and black becomes red) and rotation will be in place but this ensures when inserting into this scenario, the result will be okay.
Red node can never have red children.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What DS can implement the Set ADT?

A

Array List, Linked List, Binary Search Tree, Randomized Search Tree, AVL Tree, Red-Black Tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What DS can implement the Map ADT?

A

Array List, Linked List, Binary Search Tree, Randomized Search Tree, AVL Tree, Red-Black Tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

True or False: Set ADT can be modified to relatively simply implement Map ADT

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

True or False: Map ADT can be modified to relatively simply implement Set ADT

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

True or False: Implementing Set ADT using RBT has better worst case Big-O runtime complexity for finding and inserting elements than implementing the Set ADT using Linked List

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Worst Case Time Complexity of Insert to Hash Table

A

O(n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Averge Case Time Complexity of Insert to Hash Table

A

O(1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Average Case Time Complexity of Find to Hash Table

A

O(1)

22
Q

Average Case Time Complexity of Find to Hash Table

A

O(n)

23
Q

If all keys were Even or Odd, what might happen

A

Some hash values will never be used. Empty indices

24
Q

Hash Functions Properties

A

Input is an object of x
Output is integer representation of x
(necessary) Property of Equality: If x = y, then h(x) must equal h(y)
(not necessary) Property of Inequality: If x is not equal y, it would (be nice) if h(x) was not equal to h(y)

25
Q

UnSorted Array List and Linked List, what is worst case time complexity of FIND?

A

O(n)

26
Q

Randomized Search Tree and well-structured Skip List, average time complexity of FIND?

A

O(log n)

27
Q

Sorted Array List and Binary Search Tree, what is worst case time complexity of FIND?

A

O(log n)

28
Q

Hashing is the: (definitions)

A

transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string.

one way to enable security during the process of message transmission when the message is intended for a particular recipient only… Hashing is also a method of sorting key values in a database table in an efficient manner.”

**involves applying a hashing algorithm to a data item, known as the hashing key, to create a hash value.

29
Q

Bloom Filter

A

is a SPACE-EFFICIENT data structure used to check if an item is in the set or not.

Trade-off is space-efficiency is the probabilistic answer than deterministic. Either TRUE NEGATIVE “not in the set” or “might be in the set”

30
Q

Strength of Bloom Filter

A

Space-efficient: O(1) space, regardless of number of items inserted.
Fast: insert and lookup operations are both O(1) time

31
Q

Weaknesses of Bloom Filter

A

Probabilistic: Bloom filters can only defnitely identify true negativeness. They cannot identify true positives. positives are “might be there”

Limited Interface: Only supports insert and lookup. Cannot iterate thought items in set or delete them.

32
Q

Implementation of Bloom Filter

A

Bitmap default to 0. Insertion changes value to 1

33
Q

General Course Thesis:
Design data structures that
trade per-operation efficiency for
overall efficiency

A

Design data structures that
trade per-operation efficiency for
overall efficiency

34
Q

Collision Resolution Strategy: Linear Probing

A

Collision Strategy: if an object key maps to an index that is already occupied, simply shift over and try the next available index.

35
Q

Closed hashing collision resolution strategy (i.e., we will insert the actual key only in an address bounded by the realms of our Hash Table)

A

Contrast to Linear Probing that is considered open addressing as the destination is not entirely deterministic since if space is occupied , we shift over X number of spaces. They are OPEN to moving

36
Q

Weakness in linear probing the “Clumping” of inserts around each other as they move over to the next index. Probabilisitically this increases more than what is assumed than just straight forward M/N probability.

A

Simply the likely hood due to linear probing is greater with this resolution strategy. It is somewhat coutnerintuitive.

37
Q

With the initial indexing (probe and then shift if collison), all slots are equally likely to be filled, but upon collision, we deterministically increase the probability of filling certain slots, which are the slots that expand a clump.

A

We need a “new” approach that is deterministic in searching but does not increase probability of collision with each insertion in the manner that linear probing does. We need to distribute the insertion probability over the entire array/table/etc

38
Q

Weakness in linear probing the “Clumping” of inserts around each other as they move over to the next index. Probabilisitically this increases more than what is assumed than just straight forward M/N probability.

A

Simply the likely hood due to linear probing is greater with this resolution strategy. It is somewhat counterintuitive.

39
Q

Double hashing outperforms linear probing

However, while not facing problem of collision when inserting, we still have the issue with remove()

A

We still have to worry about reinserting elements into the Hash Table periodically to clean up the “delete” flags

In all the open-addressing methods discussed so far, the probability of future collisions increase each time an inserting key faces a collision

40
Q

Separate Chaining: Hashtable keys provides a pointer, not the data

Motivated by worst case run time increasing towards O(n)

A

Using pointers to a linked List as the key in our Hash Table

  1. Look up the index from hashing H_1(k)
  2. Follow pointer to linkedList
  3. Check if k is in the linkedList.
    3a. if not insert k into arr[index] (this is closed addressing)
    3b. if it is, then element already in table.
41
Q

Separate Chaining is often considered OPEN HASHING collision resolution strategy

A

Open hashing and closed addressing are often used interchangeably

42
Q

With Separate chaining and a doubly linked list behind each index, what is the expected worst case runtime?

A

O(n)

43
Q

What the compromise in Separate Chaining used in a HashTable?

Memory, we are storing pointers

Memory is “disconnected” from original hashTable.

A

Distance causes incremental issues regarding poor cache performance. (computer requires time to traverse to the data and send it back through a BUS or other system outside of coding).

44
Q

Cuckoo Hashing

Most aggressive Collision Resolution Strategy

Uses two hash functions H_1(k) and H_2(k) , where one key can generate two different locations

A

As both tables begin to fill up, the probability of collisions increases. However, unlike the other collision resolution strategies we discussed in which a key k could end up trying every single Hash Table slot until the entire Hash Table was full (e.g. Linear Probing, Double Hashing, Random Hashing), a key k in Cuckoo Hashing only has two different locations that it can map to (index1 = H_1(k) and index2 = H_2(k)

45
Q

Unless Cuckoo Hashing has a limit as to how many times it attempts to move a key, the algorithm will never stop.

Rehashing is generally done by introducing two new hash functions and reinserting all the elements.

A

It is important to make sure that the second hash function used returns different indices for keys that originally hashed to the same index. This is because, if a key collides with another key in the first Hash Table, we want to make sure that it will not collide with the same key again in the second Hash Table.

46
Q

By making the sacrifice of only allowing each key to hash to strictly two different locations (thereby potentially causing the cycle in the first place), we end up getting a reward of a worst-case constant time complexity for two of our major operations.

A

Find(): O(1)

Delete: O(1)

47
Q

In Cuckoo Hashing, each key is not able to hash to more than 2 locations.

A

KEY aspect/strength of CUCKOO Hashing

If you ever cycle (come back to index in table 1), then you quit

48
Q

MAP ADT allows the mapping of keys to their corresponding values

MAP ADT is considered an associative array. It gives an associative cluster of our data.

A

MAP ADT

put(key, value)
get(key)
remove(key)
size()
isEmpty()
49
Q

MAP ADT can be implemented with binary tree or with a HASHTABLE

Implementing MAP ADT with a Hash Table we call it now a HashMAP

We add two additional functions:
hasfunction(key)
key_equality(key1, key2)

A

Essentially in a hash map keys must be hashable and have an equality test to check for uniqueness.

If designer wanted to hash a custom object then they need to overload the hash and equality member functions

50
Q

Bloom Filters addresses achieving fast runtime with determining True Negative with added False positive which isn’t that “bad”

Uses an underlying probabilistic DATA STRUCTURE

A

Sacrifice precision for memory efficiency

Memory is using boolean instead of integer data , so literally 1 bit per space/index

51
Q

Benefit of Bloom Filter is as m (number of buckets/spaces in table) increases the probability of False positive decreases.

A

In creating a bloom filter we should do our best to guess how many element the user may need, or how many elements they plan to insert.

We try to pick a relatively small epilson we can tolerate . Smaller episoln means we will implement a larger table to minimize false positives but thatmeans using more memory.

Determine how many hash functions to use: k = -log_2 (EPSILON)

52
Q

Count-Min Sketch can only report the the maximum possible count of X (upper bound). It cannot report an accurate value

A

Benefit is that it provides important valuable data without adding more memory consumption.