DPP Topic 3 - Data Anonymization Flashcards
Data Anonymization
The process of removing or modifying personally identifiable information from a dataset to prevent the identification of individuals.
Original Database
The initial dataset containing personal information.
Published Database
The anonymized dataset that is made available for use.
Anonymized Data
The modified data that retains the usefulness of the original data while protecting individual privacy.
Original Data
The personal information in the initial dataset.
Balancing Data Privacy and Data Utility
The goal of anonymization is to make data less specific while retaining its usefulness.
- Determine release model
(Data Preparation)
Deciding whether the anonymized dataset will be made public or kept non-public.
- Determine re-identification risk threshold (Data Preparation)
Higher risk thresholds lead to increased data anonymity but decreased data utility.
- Classify data attributes
(Data Preparation)
Identifying explicit identifiers, quasi-identifiers, and sensitive data.
- Remove unused data attributes
(Data Preparation)
Suppressing attributes that are not required in the anonymized dataset.
- Anonymize identifiers
(Data Execution)
Applying relevant anonymization techniques to different types of identifiers.
- Evaluate the solution
(Data Execution)
Assessing the anonymized dataset for sufficient data anonymity and utility.
- Determine controls required
(Data Execution)
Implementing technical and non-technical controls to protect the anonymized data.
- Document anonymization process
(Data Execution)
Recording the details of the anonymization process for future reference.
1) Attribute Suppression
(Techniques)
Removal of an entire attribute (column) from the dataset. (Strongest Technique)