6 - Big data's end run - Nissenbaum Flashcards
What is big data and how is it used?
Paradigm of knowledge as data + framework for decision making
Uses power of patterns hidden in massive datasets
Used to get analytic insights
Which are the problems correlated with big data and how they are dealt with?
Creates new classes of good & services => requires legislative principles, BUT without killing development
Solution: anonymization (solves identification, NOT reachability) + informed consent (CANNOT fully specify terms of interaction)
Cons of solution: perceived as best & only, it is elusive & difficult to implement
What is privacy for Nissenbaum?
Contextual integrity = right over control of informational flow among social contexts with respect to roles & relations in each specific context (~Rachels definition)
Eg: medical records shared with doctor preserve contextual integrity
When concerns are raised about privacy?
When expected and actual info flows are different
Eg: race/sex info should not influence hiring decisions (expected), but sometimes they do (actual) => privacy concern
What is anonymity and which are its problems?
Elimination of link between data and owner
Problems:
1. Impossible when data is unique info
2. Prone to re-identification attacks (linkage = overlap anonymous dataset with non-anonymous ones, differencing = use multiple queries to get subset of identifying attributes)
3. May not avoid reachability (= contact someone without knowing their identity)
In which ways anonymity is implemented and which can be the problems about them?
Implemented with:
1. Anonymous identifiers: identifier, just different from commonly used (pseudonym)
Cons: reuse makes them actual identifiers, if created following patterns can still identify someone
2. Differential privacy: research field for useful analysis preserving anonymity
Cons: still breakable
Broken by:
1. Comprehensiveness: rich datasets allow attribute identifying => identification without knowing common identifier
3. Inference: use big data itself to extract hidden knowledge (common ids just noise)
What is informed consent and which are its problems?
Corollary of privacy as control over info flow
Aims to inform users about collectors of data, which data are collected, how they will be used/shared
1. Difficult to be modeled: privacy policies not read/understood, even if made readable or using opt-in default (agree to something before enrolling, instead of disagree and exit after enrollment)
2. Transparency paradox: simplicity & clarity result in fidelity loss, if policies in plain languages (if possible) too heavy + disrupt user experience flow
3. Unpredictability: uses for data unpredictable because of big data paradigm (hidden patterns => hidden scopes) & (potentially) infinite chain of collectors
4. Tyranny of minority: volunteered info about few (~20%) can unlock same info about rest (because of big data), no explicit connection required => very powerful
Are anonymity and informed consent sufficient to deal with privacy concerns and, if not, in which ways they can be improved?
No sufficient (probably dead end), BUT no actual alternative => still meaningful
Informed consent need contextualization: cover (with agreements) only detachments from EXPECTED info flow
ALSO burden of legitimacy of actions over data should be moved from users to collectors