Big Data Privacy Flashcards
1
Q
Steps: Operationalizing data anonymization
A
- Consider a taxonomy / classification of personal data
- Reflect on the underlying data disclosure / sharing scenarios
- Define your attacker model
- Apply appropriate data anonymization models and techniques
- Assess utility (loss) & residual risks
2
Q
Big Data Privacy: Possible Attacks and Attacker Models
A
- Membership disclosure: Attacker can(not) tell that a given person is in the dataset
- Sensitive attribute disclosure: Attacker can(not) tell that a given person has a certain sensitive attribute
- Identity disclosure: Attacker can(not) tell which record corresponds to a given person
3
Q
Data Anonymization Models
A
- Data Suppression-based models: Focus on preserving the privacy of respondents (i.e., those individuals who are included in the dataset)
- Data Pertubation-based models: Focus on protecting the privacy of both respondents and people not included in the dataset
4
Q
Data Suppression-based models: Vorgehen
A
- Remove/Suppress (explicit or/and quasi) identifiers before releasing the (micro)data
- While not modifying sensitive attributes and thus guaranteeing high data utility
- Each anonymized entry/dataset should be indistinguishably related to no less than a certain number of individuals in the population
- Typically no mathematical provable guarantee of anonymity
5
Q
Data Pertubation-based models: Vorgehen
A
- Anonymized data(set) must be insensitive to the insertion or deletion of an entry / a tuple in the dataset
- Noises typically added to the dataset
- And thus often resulting in sub-optimal / poorer data utility
- Typically provides provable guarantee of anonymity
6
Q
Data Pertubation-based models: Metrics
A
- e-Differential Privacy
7
Q
Data Suppression-based models: Metrics
A
- k-anonymity
- l-diversity
- t-closeness
8
Q
Limitations of k-anonymity
A
- Homogenity attack: When the set of all sensitive attribute values has little diversity, then information may be leaked
- Background Knowledge Attack: In the presence of side-information any information may become a quasi-identifier
9
Q
Limitations of l-diversity
A
- Similarity attack: When the values in a q*-block are distinct but semantically similar
-> L-diversity does not consider the semantic closeness of attributes
10
Q
Arten von strukturierten Daten
A
- Identifiers
- Quasi Identifiers
- Sensitive Information