Big Data Privacy Flashcards

Question 1

Q

Steps: Operationalizing data anonymization

Answer

A

Question 2

Q

Big Data Privacy: Possible Attacks and Attacker Models

Answer

A

Membership disclosure: Attacker can(not) tell that a given person is in the dataset
Sensitive attribute disclosure: Attacker can(not) tell that a given person has a certain sensitive attribute
Identity disclosure: Attacker can(not) tell which record corresponds to a given person

Question 3

Q

Data Anonymization Models

Answer

A

Data Suppression-based models: Focus on preserving the privacy of respondents (i.e., those individuals who are included in the dataset)
Data Pertubation-based models: Focus on protecting the privacy of both respondents and people not included in the dataset

Question 4

Q

Data Suppression-based models: Vorgehen

Answer

A

Remove/Suppress (explicit or/and quasi) identifiers before releasing the (micro)data
While not modifying sensitive attributes and thus guaranteeing high data utility
Each anonymized entry/dataset should be indistinguishably related to no less than a certain number of individuals in the population
Typically no mathematical provable guarantee of anonymity

Question 5

Q

Data Pertubation-based models: Vorgehen

Answer

A

Anonymized data(set) must be insensitive to the insertion or deletion of an entry / a tuple in the dataset
Noises typically added to the dataset
And thus often resulting in sub-optimal / poorer data utility
Typically provides provable guarantee of anonymity

Question 6

Q

Data Pertubation-based models: Metrics

Answer

A

Question 7

Q

Data Suppression-based models: Metrics

Answer

A

Question 8

Q

Limitations of k-anonymity

Answer

A

Homogenity attack: When the set of all sensitive attribute values has little diversity, then information may be leaked
Background Knowledge Attack: In the presence of side-information any information may become a quasi-identifier

Question 9

Q

Limitations of l-diversity

Answer

A

Similarity attack: When the values in a q*-block are distinct but semantically similar
-> L-diversity does not consider the semantic closeness of attributes

Question 10

Q

Arten von strukturierten Daten

Answer

A

(10 cards)