ProbabilisticDataStructure Flashcards

Question 1

Q

What are Probabilistic Data Structures used for?

Answer

A

To handle large amounts of data by providing approximate answers that are sufficient for many applications

Question 2

Q

What is hashing?

Answer

A

A technique that maps data (keys) to a limited addressing space (slots in a hash table)

Question 3

Q

What happens in a hash collision?

Answer

A

Multiple keys are mapped to the same slot in the hash table

Question 4

Q

What are the three main collision resolution techniques?

Answer

A

Chaining 2. Open Addressing 3. Cuckoo Hashing

Question 5

Q

How does chaining resolve collisions?

Answer

A

Elements mapping to the same slot are stored in a linked list

Question 6

Q

How does Cuckoo Hashing work?

Answer

A

Uses two hash functions to give each element two possible positions in the table. If a position is occupied, the existing element is moved to its alternative position

Question 7

Q

What is the main purpose of a Bloom Filter?

Answer

A

To verify if an element is present in a set

Question 8

Q

What are the possible responses of a Bloom Filter?

Answer

A

‘Definitely no’ if at least one bit is not set, ‘probably yes’ if all bits are set

Question 9

Q

What is the key advantage of Cuckoo Filter over Bloom Filter?

Answer

A

Cuckoo Filter allows element deletion

Question 10

Q

What is Count-min Sketch used for?

Answer

A

To serve as an approximate frequency table for elements in a data stream

Question 11

Q

How does Count-min Sketch estimate frequency?

Answer

A

Takes the minimum value among the counters corresponding to the element across different rows

Question 12

Q

What is the purpose of HyperLogLog?

Answer

A

To estimate the cardinality (number of distinct elements) of a set in a data stream

Question 13

Q

What is the key observation behind HyperLogLog?

Answer

A

The maximum length of leading zeros in the binary representation of a hash correlates with the number of distinct elements

Question 14

Q

What is the main advantage of HyperLogLog?

Answer

A

High precision with very low memory usage

Question 15

Q

How does HyperLogLog handle bad hash values?

Answer

A

By dividing the stream into substreams and averaging values, reducing the impact of one bad hash

ProbabilisticDataStructure Flashcards

(15 cards)