Data Publishing Flashcards
1
Q
Privacy Threats
A
- Membership disclosure: An individual’s data is in a dataset of sensitive nature
- Attribute disclosure: An individual’s data is in a dataset, and this individual’s anonymity set has a unique sensitive attribute
- Record disclosure: An individual’s data is in a dataset, and this individual’s anonymity set contains only one record
2
Q
K-Anonymity
A
Each person contained in the database cannot be distinguished from at least k-1 other individuals whose information also appears in the released database
-> utility trade-off
-> does not provide privacy when sensitive values lack diversity
3
Q
l-Diversity
A
- An equivalence class has l-diversity if there are at least l well-represented values for the sensitive attribute
-> does not consider semantics of sensitive values
-> does not consider overall distribution of sensitive values
4
Q
t-Closeness
A
- An equivalence class has t-closeness if the distance between the distribution of a sensitive attribute in this class and the distribution of the attribute in the whole table is no more than a threshold t
5
Q
How to achieve Differential Privacy
A
- Input perturbation:
-> Add noise directly to the database
+ independent of the algorithm & easy to reproduce - determining the amount of required noise is difficult
- Output perturbation:
-> Add noise to the function (statistic) output
+ easier to control privacy & better guarantees than input perturbation - results cannot be reproduced
- Algorithm perturbation:
-> Inherently add noise to the algorithm
+ algorithm can be optimized with the noise addition - difficult to generalize & depends on the inputs