Tabular And Queryable Data Protection Flashcards
What are some methods for protecting privacy in queryable databases?
Some methods for protecting privacy in queryable databases include query perturbation (adding noise to the input query or the output result), query restriction (refusing to answer certain queries), and camouflage (answering a more general query instead of the specific query).
These methods can help prevent re-identification and disclosure of sensitive information, while still enabling meaningful analysis to be performed on the data.
What is differential privacy?
Differential privacy is a framework for measuring the privacy guarantees of a data analysis or data release.
It provides a probabilistic guarantee that after adding one record to a database, query responses will remain similar to before, and ensures that no individual can be distinguished in the data with high confidence.
Differential privacy is often achieved by adding random noise to the output of a query, and is widely used as a standard for privacy-preserving data analysis.
Main protection methods
Query perturbation
Noise is added to the input query or the output result
Query restriction
Certain queries are not answered
Camouflage
The system sometimes answers a more general query
Types of disclosure in tabular data
External attacks occur when an adversary who is not part of the dataset can use the released data to identify specific individuals.
Internal attacks occur when an adversary who is part of the dataset can use the released data to infer sensitive information about other individuals in the dataset.
Dominance attacks occur when an individual in the dataset is the dominant contributor to a specific value or cell, and can use this knowledge to infer sensitive information about others in the dataset.
Methods for protection
Non-perturbative methods (such as data masking, suppression, recoding, or global recoding)
perturbative methods (such as adding random noise or query-specific noise to the data).
Differential privacy is also a widely used framework for measuring privacy guarantees of a data analysis or release.
Additionally, query restriction and camouflage can be used to limit the disclosure of sensitive information in response to specific queries.
What are some risks associated with frequency tables
Frequency tables can pose privacy risks, particularly when they contain small cell sizes or rare events, or when they include personally identifiable information. Small frequency tables can lead to re-identification of individuals, while frequency tables with rare events or small cell sizes may reveal unique characteristics of individuals.
Frequency tables with personally identifiable information can enable linkage to other datasets and increase the risk of re-identification and sensitive information disclosure.