chapter 2 Flashcards
what are the roles in data collection and data publishing
Data Recipient
Data Publisher
Record Owners
an example of data recipient
Medical Center
(data mining)
an example of data Publisher
Hospital
(data anonymization)
an example of record owners
patients
what are the data attributes
Explicit Identifier
Quasi Identifier (QID)
Sensitive Attributes
Non-Sensitive Attributes
what are explicit identifiers
Data attributes that explicitly identifies record owners, e.g., name, identity card number, mobile phone number.
what are Quasi Identifier (QID)
Data attributes that could potentially identify record owners, e.g., postal code, age, gender.
what are sensitive attributes
Data attributes that are sensitive person-specific information, e.g., salary, disease, disability status
what are non sensitive attributes
Data attributes that do not fall into all of the other categories.
what are the roles responsible with data collection
data publisher
record owners
what are the roles responsible for data publishing
data receipient
what are the privacy attacks
record linkage
attribute linkage
table linkage
probabilistic
what is the record linkage model
- Similar Quasi Identifier (QID) values grouped into small number of records
- Victim’s QID matches and linked to this group
- Smaller number of possibilities in identifying the
victim’s record - Identifying the victim in this group, with additional
information
example of record linkage model
Example: Hospital wants to publish the patient records in Table 1 to a research center
- Research center has access to the external table, Table 2
- Research center knows that every person with a record in Table 2 has a record in Table 1
- Joining the two tables on the common attributes Job, Sex, and Age may link the identity of a person to his/her Disease
what is the attribute linkage model
- Adversary may not precisely identify the record of the
target victim - Victim belongs to a group, based on a set of
Sensitive Attributes - Adversary could infer victim’s sensitive values from
the published data
example of attribute linkage model
Example: Hospital anonymizes the data, Job/Age, into a range, to reduce record linkage
- Adversary has knowledge that the target victim Emily, is a dancer, is 30 years old, and has a record in the published data
- Adversary may infer that Emily has HIV with 75% confidence, because 3 out of 4 artists at age 30-35 have HIV
what is the table linkage model
- Adversary can confidently infer the presence, or the
absence, of the victim’s record in the published data - If a hospital publishes data with a particular type of
disease - Inferring presence of the victim’s record in the table
is already damaging
example of table linkage model
Example: Hospital publishes patient data in Table 3 – table linkage attack on
target victim, Alice
* Adversary is presumed to also have access to external public data in Table 4
* 4/5th or 80% probability that Alice has HIV
* 4 records in Table 3 and 5 records in Table 4 containing, Artist, Female,
[30−35]
what is Probabilistic Model
- Does not focus on records, attributes, or tables that
can be linked to a target victim - Compare probability before and after access the
published data - Adversary believes that the probability of identifying
the target victim’s sensitive information, increases
after accessing the published data, compared to the
probability before
is the adversary’s knowledge limited to quasi identifiers ?
No,
- Privacy-preserving data publishing has to take
additional Background Knowledge into consideration - Includes, public statistical data, social networks like
Facebook and LinkedIn, common sense, etc.