Tracking/Ethics Flashcards
How do you approach anonymization in general?
- Eliminate bits that would identify user: hostnames, user ids, IP addresses
- If devices shall remain identifiable: use pseudonyms (rand strings), one-way hashes
- Individual fields may be obfuscated but combinations might still reveal personal information
Explain what device manufacturers do to harden tracking
- MAC address randomization: New MAC address for every probe request
- Probe requests w/o SSID
- Systems won’t easily track your movement anymore
However: once connected, address is stable
Name and explain 5 ethics theories and practices
- Consequentialism: end justifies the means
- Deontology: Morality of actions is only measure
- Virtue Ethics: how we should be rather than what we should do
- Principlism: Moral principles of autonomy, beneficence, nonmaleficence, justice
- Pluralism and casuistry: Appleal to common sense wen weighing reasons and balancing risk against benefits
Name 3 types of publication
- Interactive Model: Data owner is gatekeeper - others have to request queries, data owner gives anonymized answers
- “Send me your code”: Data owner executes code on their system and reports result (problem: malicious? error-free?)
- Offline - aka “publish and be damned”: Data owner publishes anonymized dataset
What categories of records exist?
- Identifiers (Personal Identifiable information PII): full name
- Quasi-identifiers (triple: zip code, date of birth, gender)
- Sensitive attributes: should not be associated with individuals
- Other
Name 3 anonymization objectives
Protect individuals against:
- Membership disclosure
- Sensitive attribute disclosure
- Identity disclosure
be aware that additional information can be used
Name 4 anonymization approaches
Suppression: remove (parts of) attributes
Generalization: Limit granularity: age: 21 -> age 20-30
Perturbation: Add noise to data while preservice general properties
Permutation: Swap association of attributes across records
What is k-Anonymity and how does it work?
Each record should be indistinguishable from at least (k-1) others on its QI attributes
or: Cardinality of any query result should be at least k
We have to consider all different data sources and try to avoid linking
Name some problems with k-anonymity
Efficiency problem: expensive to find k-anonymity transform with max utility
Security problem (e.g. k=4): k-anon: identifier, quasi-identifier, other.
1: Other data might have little diversity: information may be leaked
2: In presence of side-information, any information can become quasi-identifier
What is L-diversity?
Build q-blocks (grouped by one attribute).
A q-block is l-diverse if it contains at least L well represented values for the sensitive attribute S.
A table is l-diverse if every q-block is l-diverse
What is t-closeness?
An equivalence class is said to have t-closeness if the distance between the distribution of a sensitive attribute in this class and the distribution of the attribute in the whole table is not more than t
How can we practically deal with anonymity vs utility?
It is rare that releasing data sets after sanitization can preservice privacy and utility so we can only release the rich under license to designated trusted parties