Privacy + Data Sci/Machine Learning Basics Flashcards

Question

What is the target?

Answer 1

feature/attribute we are trying to predict

Answer 2

A single attribute: if the frequency of a particular value of an attribute is low. More than one attribute: combinations of attributes can combine to occur even less frequently

Answer 3

re-identification of individuals is possible when only a single dataset is shared, and when multiple datasets where one is anonymized are shared.

Answer 4

- a record by itself cannot be linked to an individual;

Answer 5

with no additional information the person can be directly found ({name, address} or {name, phone});

Answer 6

all explicit identifiers removed, generalized, or replaced (name, address, phone);

Answer 7

a set of data elements that are not explicit identifiers that in combination associates uniquely or almost uniquely to an individual

Answer 8

If an anonymized dataset is released publicly, through the notion of unique combinations and linking with another public dataset with identifiable information, we can re-identify individuals.

Answer 9

direct linking refers to the process of identifying an individual by correlating anonymized data with another dataset that contains identifying information. relies on explicit attributes

Answer 10

Linking through similarity refers to the process of re-identifying individuals in an anonymized dataset by correlating patterns or attributes with another dataset that contains identifiable information. relies on statistical similarity

Answer 11

The information (identifier or quasi-identifier) contained for each individual in the released dataset cannot be distinguished from at least k − 1 individuals whose information is also in the released dataset. Any quasi-identifier present in the released table must appear in at least k records.

Answer 12

Real world datasets are very sparse if project into low dim, we lose info these datasets provide low utility independent releases can be linked to infer info difficult to achieve (NP-hard)

Answer 13

membership disclosure is protected sensitive attribute protected identity disclosure protected

Answer 14

k-anonymity can create groups that leak information due to lack of diversity in the sensitive attribute.

Answer 15

k-anonymity does not protect against attacks based on background knowledge

Answer 16

Let a q∗-block be a set of tuples such that its non-sensitive values generalize to q∗. A q∗-block is ℓ-diverse if it contains ℓ "well represented" values for the sensitive attribute S. A table is ℓ-diverse, if every q∗-block in it is ℓ-diverse. An equivalence class is said to have ℓ-diversity if there are at least ℓ well-represented values for the sensitive attribute. A table is said to have ℓ-diversity if every equivalence class of the table has ℓ-diversity

Answer 17

May be difficult and unnecessary to achieve insufficient to prevent attribute disclosure, as shown with the following two potential attacks: similarity attack and skewness attack

Answer 18

An equivalence class is said to have t-closeness if the distance between the distribution of a sensitive attribute in this class and the distribution of the attribute in the whole table is no more than a threshold t. A table is said to have t-closeness if all equivalence classes have t-closeness.

Privacy + Data Sci/Machine Learning Basics Flashcards

Test 1 Prep (42 cards)