Chapter 4: Identity and Anonymity Flashcards
What is the strongest form of identity?
Identified individual
What is a form of identity that is weaker than identified individual?
Pseudonymous
What can be done with a pseudonym?
Link different data items about the same individual without knowing the actual person the data is about
What is the weakest form of identity?
Anonymity
Describe truly anonymous data
We not only do not know the individual the data is about, we cannot even tell if two data items are about the same individual
Why are roles important in privacy?
Often, it is not important who an individual is, only that the person is authorized to perform an action
Provide an example of how a role can be used to reduce the need to identify an individual?
A credit card account may have several authorized users, and the merchant only needs to know that one of the authorized users is making a purchase (a role), not which one (an identity)
How can an individual increase their level of privacy when sending an email or when accessing services on the web?
Use a hash function to prove their identity without revealing it
Use a 3rd party to validate their identity
Why would a system need to know a person’s identity?
- Access control
- Attribution - the ability to prove who performed an action
- Enhanced user experience and personalization
How can you increase privacy but still personalize a website to each user?
Use a pseudonym
What do you need to represent identity?
- A combination of information that is unique (name + DoB) - this typically results in an identified individual
- User-specified identifier (user ID)
- System-generated user IDs
- Externally created unique IDs (for example an email)
- Identity systems (google wallet, PKI)
- Biometrics
What are the advantages of using user IDs?
- The system can guarantee uniqueness
* Provides pseudonymity
What are the disadvantages of using user-specified user IDs?
- Users may want the same user ID
- Users who forget their user ID may try something generic like their last name and end up locking someone else out of their account after multiple tries
What are the advantages of system-generated user IDs over user-specified user IDs?
• Provides greater privacy - a user-specified ID may include personal information like their name
What are the advantages of using an externally created unique IDs?
- User friendly (user can reuse another identifier)
- Reduces the number of identifiers a user needs to remember
- The burden of providing uniqueness is outsourced
- Information can be linked across systems
- Easier to detect fraud or identity theft
What are the dangers of using some biometrics as an identifier?
- Face recognition is not accurate enough in large groups of people - you might end up with false positives
- Using it for both identification and authentication could provide someone with inappropriate access
What is the purpose of authentication?
Used to ensure that an individual performing an action matches the expected identity
What are the 4 main categories of authentication mechanisms?
- What you know - passwords or personal information
- What you have - requiring an object
- Where you are - location
- What you are - biometrics
What needs to be considered when deciding which authentication mechanism to use?
Challenges of creating and revoking the chosen credentials
What is the advantage of using passwords?
High level of assurance that the correct individual is being identified
What is the disadvantage of using passwords?
They can be easily broken
List the 2 categories of password-based authentication attacks
- Attacks on the password itself
* Attacks performed directly through the system
How can you avoid password guessing attacks and how could it negatively affect users?
Apply a limit on failed password attempts - places a burden on legitimate users who incorrectly enter their password
Provide an example of a password guessing attack method
Dictionary attack
Provide two examples of password-based attacks performed directly through the system?
- Man-in-the-middle attack
* Replay attack
How can man-in-the-middle attacks be combated?
Encrypting the password
How do replay attacks work?
- Usually combined with man-in-the-middle attack when the password is encrypted
- The hashed password is replayed to gain access
How can you combat a replay attack?
Issue a unique challenge for each authentication
For example, using a different encryption key for each authentication attempt
How are security questions typically implemented in today’s systems?
Normally as a secondary form of authentication - for example when a user needs to reset a forgotten password
Devices fall into which category of authentication?
What you have
Provide examples of devices used in authentication
- Identification badges or smart cards
- Radio Frequency Identification (RFID)
- Small devices with changing PINs
- Using the computer itself (ex. MAC address)
What is the risk of using devices for authentication?
They can be lost or stolen
How do you mitigate the risk of an individual losing their authentication device?
Combine authentication with another means such as a password
Why is location a good method of authentication when used for corporate offices?
Requires an attacker to gain physical access as well as defeat other authentication (such as passwords)
How should location be implemented in authentication?
Should almost always be viewed as a secondary form of authentication, used to provide stronger evidence that the primary form of authentication is correct
Provide 3 examples of biometrics
- Fingerprints
- Face recognition
- Voice recognition
What more advanced forms of facial recognition exist?
Expression and gaze - these prevent the use of static images to spoof the system
What can you do to improve the security of voice recognition?
- Make it text-dependent - a user enrolls a specific passphrase (this also provides the ability to change it if it is compromised)
- Another method is having the system ask the user to read something out - this mitigates the risk of replay attacks if someone recorded the individual
Why does the use of biometric data raise inherent privacy concerns?
- A fingerprint or picture are individually identifiable
* Some individuals may also not want to show their face
What options exist for providing continuous authentication?
Based on behaviour such as typing rate or mouse movement
Provides a degree of confidence that the user hasn’t walked away and someone else has stepped into the account
What is multifactor authentication?
Systems that require 2 or more different mechanisms to authenticate
How can multifactor authentication be transparent to the user?
Using cookies to confirm that the user has previously logged in using that device
Provide an example of authentication being separate from the system requiring authenticity
Single sign-on
What is the scope of application for GDPR?
Personal data
What is the scope of the US Healthcare Insurance Portability and accountability Act (HIPAA)?
Protected health information
What is browser fingerprinting?
In an effort to personalize or customize the user experience, websites can track the user with browser cookies or other techniques
Why are IP addresses considered PII?
Evidence has shown that even dynamically assigned IP addresses do not change frequently, allowing a client computer to repeatedly be linked to the same IP address for long periods of time
IPv6 uses 64-bit numbers derived from a computer’s hardware address
Provide examples of strong identifiers
National identification
Passport or credit card number
Names can be, but common names may not be uniquely identifying
What are weak identifiers?
Identifiers that must be used in combination with other information to determine identity
What are quasi-identifiers?
Data that can be combined with external knowledge to link data to an individual
List 3 approaches to anonymization
- Suppression
- Generalization
- Noise addition
Describe suppression as an anonymization approach
Removing identifying values from a record
Names and identifying numbers are typically handled through suppression
Describe generalization as an anonymization approach
Replacing a data element with a more general element; for example, by removing the day and month from a birth date
Describe noise addition as an anonymization approach
Replacing actual data values with other values that are selected from the same class of data
What is a microdata set?
A microdata set contains the original records, but the data values have been suppressed or generalized, or noise has been added to protect privacy
What is data imputation?
Replacing suppressed values with plausible data to mimic the actual dataset
What is value swapping?
Switching values between records in ways that preserve most statistics but no longer give correct information about individuals
Provide examples of generalization in anonymization
- Generalizing values to ranges (e.g., birth decade rather than birth year)
- values are often top- and bottom-coded (e.g., reporting all ages over 80 as “>80” as opposed to reporting decade)
- Rounding can also be used as a form of generalization (e.g., to the nearest integer, or nearest 10)
- Removing the last three digits of the postal code or more general if this does not yield a region containing at least 20,000 people
What is k-anonymity?
Requires that every record in the microdata set must be part of a group of at least k records having identical quasi-identifying information
What is l-diversity?
Extends k-anonymity by further requiring that there be at least l distinct values in each group of k records
What is t-closeness?
Ensures that the distribution of values in a group of k is sufficiently close to the overall distribution
What is the key issue when releasing aggregated data?
Determining whether the data is frequency or magnitude data
How do you determine whether aggregated data is frequency or magnitude?
Determine whether individuals contribute equally or unequally to the value released
For example, a count of the number of individuals at a given income and age is frequency data: Each individual contributes one to the cell they are in
A table giving average income by age is magnitude data: Someone with a high income will affect the average much more than an individual whose income is close to the average
How should you anonymize magnitude data?
Noise addition or entire suppression of the cell is typically needed to ensure privacy
How should you anonymize frequency data?
Rounding techniques may well be sufficient
What is database reconstruction?
Builds a dataset that would generate the aggregate statistics
In many cases, it can be shown that such a reconstructed database is unique, or at least that many of the individual records in the dataset are unique
How can you mitigate the risk of database reconstruction?
Techniques such as top- and bottom-coding, rounding and suppression do not guarantee protection against database reconstruction
The only way to provide guaranteed limits on the risk of database reconstruction is noise addition
What is a differentially private algorithm?
Its behavior hardly changes when a single individual joins or leaves the dataset – anything the algorithm might output on a database containing some individual’s information is almost as likely to have come from a database without that individual’s information
What client-side techniques can be used to increase anonymity?
- Proxy servers
- Onion routing and Crowds
- Tools that generate “cover queries” - fake query traffic that disguise the actual request
- Noise addition
How does a proxy server provide anonymity?
They hide the IP address of a request by replacing it with that of the proxy server
What is TOR?
Tor is a peer-to-peer network where each request is routed to another peer, which routes it to another peer, and so on until a final peer makes the actual request
Encryption is used to ensure that only the first peer knows where the request came from, and only the last peer knows the server to which the request is being routed