Sttiatical Disclosure Control Concepts Flashcards
Explain Attribute versus identity disclosure
Attribute disclosure: Reveals individual attributes without identifying by name (e.g., a medical study reporting the average age of patients with a certain condition).
Identity disclosure: Reveals an individual’s identity directly or indirectly (e.g., personal information database hacked and names and contact details are published online)
three types of data commonly used in statistical analysis
Tabular data: Organized in tables with columns and rows, including magnitude tables (values for a variable) and frequency tables (number of observations for a variable).
Queryable databases: Stored in databases that can be queried with SQL to extract specific information or relationships among variables.
Microdata: Individual-level data, with each row representing an individual.
What is meant by risk-utility trade-offs?
risk-utility trade-offs refer to the balance between the risk of disclosing sensitive information and the utility of the released data.
Finding a suitable level of disclosure risk that minimizes the possibility of identifying individuals while maximizing the usefulness of the released data for research and analysis purposes.
healthcare, where medical treatments may have potential risks, but the benefits may outweigh those risks.
How can you determine if a dataset satisfies k-anonymity?
Perform a re-identification attack to see if an individual in the dataset based on quasi-identifiers (e.g., age, gender, zip code).
If it is not possible to identify an individual with a certain level of confidence, then the dataset satisfies k-anonymity for that value of k (i.e., the minimum number of individuals that share a given combination of quasi-identifiers).
Why is k-anonymity insufficient?
K-anonymity is considered insufficient for protecting privacy in some cases because it only addresses the risk of re-identification, while other privacy risks such as attribute disclosure or background knowledge attacks may still be possible.
k-anonymity does not provide any guarantees about the quality or usefulness of the data once it has been anonymized, which may limit its utility for certain applications.
Other privacy models, such as differential privacy, have been developed to address these limitations.