13 BLOOM FILTERS Flashcards

1
Q

What is a Bloom filter?

A

A data structure that optimizes memory usage and execution time to track keys and answer if a key has been seen before.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Who invented Bloom filters and when?

A

Burton Bloom in 1970.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What simple question does a Bloom filter answer?

A

Have we previously seen this key?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the main drawback of Bloom filters?

A

They can produce false positives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is guaranteed about Bloom filters regarding false negatives?

A

They guarantee a lack of false negatives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does a Bloom filter improve search efficiency?

A

By first checking the Bloom filter before accessing a larger data structure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What happens when a Bloom filter returns false?

A

The record is not in the data set, and the expensive lookup can be skipped.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What type of data structure is a Bloom filter fundamentally?

A

An array of binary values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In a Bloom filter, what does a value of 1 indicate?

A

That a given bin has been seen before.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the primary purpose of the Bloom filter?

A

To filter over a large key space with low memory overhead.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does a Bloom filter handle collisions?

A

By using multiple independent hash functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the result of using k independent hash functions in a Bloom filter?

A

It reduces the likelihood of false positives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Fill in the blank: A Bloom filter uses _______ to map keys to indices.

A

k independent hash functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

True or False: A Bloom filter can guarantee that a key is in the data set if it returns true.

A

False.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the relationship between the size of the Bloom filter and the number of hash functions?

A

Balancing them can reduce the probability of false positives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What analogy is used to explain how a Bloom filter works?

A

Asking a knowledgeable event organizer about a friend in a crowded ballroom.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a significant advantage of using a Bloom filter in searching?

A

It can avoid many pointless searches.

18
Q

What happens when too many values are inserted into a simple binary indicator array?

A

It leads to hash collisions and false positives.

19
Q

How does a Bloom filter determine if a key exists?

A

All k array values must be 1 for the key to be considered present.

20
Q

What is an example application of a Bloom filter mentioned in the text?

A

Checking if a password is on a list of known weak passwords.

21
Q

In the context of Bloom filters, what does a false positive mean?

A

Indicating that a key has been inserted when it has not.

22
Q

What is one method suggested to improve a simple single-hash-function filter?

A

Increase the size of the binary array.

23
Q

What concept does the Bloom filter utilize to optimize searching through large data sets?

A

A two-stage lookup process.

24
Q

What is a Bloom filter used for in the context of trying new coffee?

A

To quickly determine if a new coffee should be sampled based on five attributes.

25
What attributes are used to construct a Bloom filter for coffees?
Primary smell and viscosity, among others.
26
What happens if any attribute maps to 0 in a Bloom filter?
The new coffee can be safely consumed, as it is guaranteed not to be on the barista's list.
27
What is the structure of a Bloom filter in code?
BloomFilter { Integer: size, Integer: k, Array of bits: bins, Array of hash functions: h }
28
What are the functions used to manipulate a Bloom filter?
BloomFilterInsertKey and BloomFilterLookup.
29
True or False: The runtime of both insertion and lookup in a Bloom filter is dependent on the number of items inserted.
False.
30
What factors impact the false positive rate of a Bloom filter?
* Size of the array (m) * Number of hash functions (k) * Number of items inserted (n)
31
What is the formula for approximating the false positive rate of a Bloom filter?
FalsePositiveRate = (1 – (1 – 1/m)^(nk))^k
32
What happens to the false positive rate when the size of the array is increased?
It decreases.
33
What is the consequence of increasing the number of items inserted into a Bloom filter?
It increases the false positive rate.
34
How does the number of hash functions (k) affect the false positive rate?
It can increase or decrease the rate depending on other parameters.
35
What is the tradeoff when tuning Bloom filter parameters?
Between false positive rate, computational cost, and memory cost.
36
What is a significant advantage of Bloom filters over hash tables?
Bloom filters use significantly less memory.
37
Why might hash tables be less efficient than Bloom filters in terms of memory?
Hash tables require storing keys and pointers, leading to more memory usage.
38
How can Bloom filters optimize computationally sensitive applications?
By keeping the filter in memory or cache for fast access.
39
What is the key characteristic of the accuracy of a Bloom filter's lookup operation?
It is probabilistic and can return false positives.
40
What does the next chapter focus on after discussing Bloom filters?
Data structures that use randomness to avoid bad worst-case behavior, such as skip lists.