Privacy Budget Flashcards

1
Q

Describe the purpose of Epsilon in the budget

A

quantifies how much a differentially private analysis can deviate from the scenario where an individual’s data is excluded (“opt-out”)

A smaller ε indicates stronger privacy protection but potentially less accuracy in the analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can Epsilon be used

A

As you perform more analyses, the total ε consumed increases. Exceeding the predetermined privacy budget might lead to unacceptable privacy risks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe the purpose of Delta in the budget

A

Delta (δ) represents the probability of a significant privacy breach, or the odds of something going wrong with the privacy protection

While ε bounds the typical privacy loss, δ acknowledges that there might be rare events where the privacy loss exceeds ε

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Is Delta usually set to a high or low value

A

δ is often set to a very small value, such as 10^-5, to ensure a high probability of maintaining privacy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are 3 main factors to consider when choosing the Epsilon and Delta values?

A

Sensitivity of the data: Highly sensitive data might require smaller values of ε and δ.

Number of analyses: Performing multiple analyses on the same dataset necessitates smaller values of ε and δ for each analysis to maintain the overall privacy budget.

Accuracy requirements: Achieving high accuracy might require a larger ε, potentially weakening privacy protection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When analysing multiple statistics that a single individual can influence, which noise function offers better accuracy, Gaussian noise (scaled by the square root of the number of statistics) or Laplace noise (scaled linearly)?

A

Gaussian noise (scaled by the square root of the number of statistics)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When releasing multiple statistics how can you use the Epsilon value?

A

When releasing multiple statistics, you can allocate different portions of your total ε budget to each statistic based on its importance and sensitivity to noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In relation to the number of users in a dataset (n), how small should delta be?

A

Significantly smaller than 1/n to avoid classifying trivial mechanisms, like releasing a random record, as differentially private

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What Delta does Facebook use for their URL dataset

A

10^-5 for Gaussian noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What delta would practically guarantee no catastrophic events?

A

10^-30

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What Delta does the US Census Bureau use in its 2020 Census data releases?

A

10^-5 Gaussian noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Given a Delta of 10^-5, what is the chance that the privacy protection offered by differential privacy might fail for an individual in the dataset

A

1 in 100000

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In the US Census 2020 Data Release, what epsilon was used for the Detailed DHC datasets, providing fine-grained racial and ethnic information?

A

49.21

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the recommended way to determine δ?

A

Several sources emphasise that δ should ideally be significantly smaller than 1/n, where ‘n’ is the number of individuals in the dataset.

The reasoning here is that each individual, in a worst-case scenario, faces a δ probability of data leakage, so n*δ is the total probability of at least one person’s data being compromised. This value should, therefore, be minimised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is e (as the base of the natural logarithm)?

A

e represents the natural limit for how quickly something can grow when it builds on itself continuously. This applies not just to money but also to things like population growth, cooling rates, and radioactive decay.

e.g. In the context of interest payments:

As the payments become more frequent—every hour, every minute, or even every second—the calculation looks like this:

( 1 + 1/n )*n ,

where n is the number of times the interest is paid per year.

As n grows larger, the total balance gets closer and closer to a specific number, but it doesn’t grow forever. This number is approximately 2.718, and it’s what we call e. No matter how much more frequently the interest is paid, your balance will never exceed e dollars for $1 invested at 100% interest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain how ε affects the privacy

A

DP assumes the attacker knows almost all elements. They only have uncertainty about their target. Say they want to know whether their target has green eyes. If you output the real number k, they can compare it with the number of people with green eyes among the people they know. If it’s k−1, then the target has green eyes. If it’s k, then the target does not.

Say they want to know whether their target has green eyes. If you output the real number k, they can compare it with the number of people with green eyes among the people they know. If it’s k−1, then the target has green eyes. If it’s k, then the target does not.

ε-differential privacy picks a random value according to Laplace(1/ε) and adds the noise to the real value.

The attacher knows 1000 people have green eyes and lets say the real result (k) = 1001 (the target has green eyes). After adding noise the value pubilshed is 1003.

The attacker has to determine the likelihood that the real number is 1001 vs 1000. As the returned number is 1003, 1001 is a bit more likely (generating a noise of 2 is more likely than a noise of 3 in Laplace). The privacy of this individual is dependent on how much more likely.

It turns out that the ratio between these likelihoods is eε!

e.g. if ε is 1 then the ratio of probabilities at most ~2.718

16
Q

What is e in mathematics

A

E is the mathematical constant approximately equal to 2.718. It’s the base of natural exponential functions, which are commonly used to describe growth, decay, and probabilities:

                                   n e =.   lim        (1 + 1/n)
 n->infinity

As n grows larger, the total number gets closer and closer to a specific number, but it doesn’t grow forever. This number is approximately 2.718, and it’s what we call e. No matter how much more frequently the interest is paid, your balance will never exceed e dollars for $1 invested at 100% interest.

17
Q

What is Eε

A

This represents the ratio of probabilities that two similar datasets (differing by just one individual) will produce the same output after noise is added.

The value eϵ bounds how much more likely one outcome is compared to another. For instance, if ϵ = 1, eϵ = e1 ≈ 2.718, meaning the ratio of probabilities is at most ~2.718.

18
Q

If you are using DP, do you need attack modelling?

A

No. DP guarantees:

  1. You protect any kind of information about an individual. It doesn’t matter what the attacker wants to do. Reidentify their target, know if they’re in the dataset, deduce some sensitive attribute… All those things are protected. Thus, you don’t have to think about the goals of your attacker.
  2. It works no matter what the attacker knows about your data. They might already know some people in the database. They might even add some fake users to your system. With differential privacy, it doesn’t matter. The users that the attacker doesn’t know are still protected.
19
Q

With DP how can you quantify privacy loss

A

With DP you can quantify the greatest possible information gain by the attacker. The corresponding parameter, named ε, allows you to make formal statements. Suppose ε=1.1. Then, you can say: “an attacker who thinks their target is in the dataset with probability 50% can increase their level of certainty to at most 75%.”

E^ε = 3, so the ratio is 1:3 = 25:75%.

20
Q

Can you compose data releases with DP?

A

Yes. For example, if you had 2 data releases that both were released using a parameter of ε, if they are combined the resulting data is still protected but at a weaker level of privacy: the parameter will be 2ε.

21
Q

If you are counting things, how do you adjust DP to account for an individuals contribution if it can be more than 1 thing?

A

If an individual can contribute more than one thing that is counted then the influence this individual has on the count is much larger. If they could contribute 5 things the possible difference between the results (ratio) is now e5ε, so using a parameter of 1/ε only gives 5ε-differential privacy. This is a ratio of e^ε5, which if using a ε value of 1.1 = 1:244.69! 0.4:97.6%

To counter this we need to add 5 times the amount of noise. So Laplace(5/ε) would give ε-differential privacy. 5/ε makes the distribution look more like this ^, e.g. much wider. If using a ε value of 1.1 it would become 4.54545455 (note this is the sensitivity of the Laplace, not the new ε value, which in this case remains at 1.1.

To preserve the desired level of privacy, you need to clamp all values to the estimated maximum. In other words, for an outlier user, you would only count 5 things in the non-noisy sum.

22
Q

Can you apply a fixed transformation in a post-processing phase of DP without breaking the DP?

A

Yes. If you take differentially private data, and make it go through a fixed transformation, you still get differential privacy.

So you could round negative values to zero, or round non-integer counts.

23
Q

If you are releasing multiple stats about each user, e.g. age, ethnicity, sexuality, how do you guarantee privacy?

A

You have to consider each count as a seperate data release. Thus, if you have C different counts, you have to add Laplace noise of scale C/ε to each of them. Each independent release will be ε/C-differentially private. And we can now use the composition property of differential privacy! This allows us to conclude that the entire release is ε-differentially private.

24
Q

What does the 1/ε mean in Laplace(1/ε)

A

1/ε represents the Laplace distribution scale parameter (b in the function). The larger the result of this parameter the wider the graph (stronger privacy), the smaller the narrower the graph (weaker privacy).

ε is inversly proportional to the privacy strength.

By increasing ε you are weakening the privacy but making the results more accurate.

By increasing the 1, you are increasing b and therefore strengthening the privacy but making the results less accurage.

25
Q

What is Zero-Concentrated DP (zCDP)?

A

Zero-concentrated differential privacy (zCDP) provides a way to measure privacy loss that considers the average privacy loss, rather than solely focusing on the worst-case scenario.