IV Techniques Flashcards
Data Aggregation
Data expressed in summary form. Reduces the quality and value of the data, but also eliminates the connection between the data and individuals.
Frequency vs magnitude data
With frequency, all individuals contribute equally. With magnitude, contributions are unequal.
Differential privacy
Ensure aggregated data is useful but nonspecific enough to avoid identifiers. Achieved using an algorithm.
Differential identifiability
Improves on differential privacy by setting parameters for the algorithm generating noise.
Deidentification
Technique for preventing someone’s identity from being connected to their personal information.
Anonymization (deidentification)
Direct and indirect identifiers are removed, and mechanisms have been implemented to prevent reidentification.
Pseudonymization (deidentification)
Replaces individual identifiers (like names) with numbers, letters, symbols or some combination thereof. This ensures data points aren’t directly associated with a specific individual.
k-anonymity, l-diversity, t-closeness
Three techniques developed to reduce the risk of anonymity of data being compromised by someone that combines it with known info to make assumptions about the data.
k-anonymity
Creates generalized, truncated or redacted “quasi-identifiers” as replacement for direct identifiers (like names). “k” number of individuals in the data set will share the same identifier.
l-diversity
Builds on k-anonymity by requiring at least “l” distinct values in each group of k records when it comes to sensitive attributes.
t-closeness
Extends l-diversity by reducing the granularity of data in a data set.
tokenization
System of de-identifying data through the use of random tokens as stand-ins for meaningful data.