Big Data Privacy Flashcards

1
Q

Steps: Operationalizing data anonymization

A
  1. Consider a taxonomy / classification of personal data
  2. Reflect on the underlying data disclosure / sharing scenarios
  3. Define your attacker model
  4. Apply appropriate data anonymization models and techniques
  5. Assess utility (loss) & residual risks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Big Data Privacy: Possible Attacks and Attacker Models

A
  • Membership disclosure: Attacker can(not) tell that a given person is in the dataset
  • Sensitive attribute disclosure: Attacker can(not) tell that a given person has a certain sensitive attribute
  • Identity disclosure: Attacker can(not) tell which record corresponds to a given person
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data Anonymization Models

A
  • Data Suppression-based models: Focus on preserving the privacy of respondents (i.e., those individuals who are included in the dataset)
  • Data Pertubation-based models: Focus on protecting the privacy of both respondents and people not included in the dataset
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data Suppression-based models: Vorgehen

A
  • Remove/Suppress (explicit or/and quasi) identifiers before releasing the (micro)data
  • While not modifying sensitive attributes and thus guaranteeing high data utility
  • Each anonymized entry/dataset should be indistinguishably related to no less than a certain number of individuals in the population
  • Typically no mathematical provable guarantee of anonymity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data Pertubation-based models: Vorgehen

A
  • Anonymized data(set) must be insensitive to the insertion or deletion of an entry / a tuple in the dataset
  • Noises typically added to the dataset
  • And thus often resulting in sub-optimal / poorer data utility
  • Typically provides provable guarantee of anonymity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data Pertubation-based models: Metrics

A
  • e-Differential Privacy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data Suppression-based models: Metrics

A
  • k-anonymity
  • l-diversity
  • t-closeness
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Limitations of k-anonymity

A
  • Homogenity attack: When the set of all sensitive attribute values has little diversity, then information may be leaked
  • Background Knowledge Attack: In the presence of side-information any information may become a quasi-identifier
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Limitations of l-diversity

A
  • Similarity attack: When the values in a q*-block are distinct but semantically similar
    -> L-diversity does not consider the semantic closeness of attributes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Arten von strukturierten Daten

A
  • Identifiers
  • Quasi Identifiers
  • Sensitive Information
How well did you know this?
1
Not at all
2
3
4
5
Perfectly