ProbabilisticDataStructure Flashcards

1
Q

What are Probabilistic Data Structures used for?

A

To handle large amounts of data by providing approximate answers that are sufficient for many applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is hashing?

A

A technique that maps data (keys) to a limited addressing space (slots in a hash table)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What happens in a hash collision?

A

Multiple keys are mapped to the same slot in the hash table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the three main collision resolution techniques?

A
  1. Chaining 2. Open Addressing 3. Cuckoo Hashing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does chaining resolve collisions?

A

Elements mapping to the same slot are stored in a linked list

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does Cuckoo Hashing work?

A

Uses two hash functions to give each element two possible positions in the table. If a position is occupied, the existing element is moved to its alternative position

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the main purpose of a Bloom Filter?

A

To verify if an element is present in a set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the possible responses of a Bloom Filter?

A

‘Definitely no’ if at least one bit is not set, ‘probably yes’ if all bits are set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the key advantage of Cuckoo Filter over Bloom Filter?

A

Cuckoo Filter allows element deletion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Count-min Sketch used for?

A

To serve as an approximate frequency table for elements in a data stream

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does Count-min Sketch estimate frequency?

A

Takes the minimum value among the counters corresponding to the element across different rows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the purpose of HyperLogLog?

A

To estimate the cardinality (number of distinct elements) of a set in a data stream

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the key observation behind HyperLogLog?

A

The maximum length of leading zeros in the binary representation of a hash correlates with the number of distinct elements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the main advantage of HyperLogLog?

A

High precision with very low memory usage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does HyperLogLog handle bad hash values?

A

By dividing the stream into substreams and averaging values, reducing the impact of one bad hash

How well did you know this?
1
Not at all
2
3
4
5
Perfectly