Chapter 4: Identity and Anonymity Flashcards

1
Q

What is the strongest form of identity?

A

Identified individual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a form of identity that is weaker than identified individual?

A

Pseudonymous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What can be done with a pseudonym?

A

Link different data items about the same individual without knowing the actual person the data is about

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the weakest form of identity?

A

Anonymity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe truly anonymous data

A

We not only do not know the individual the data is about, we cannot even tell if two data items are about the same individual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why are roles important in privacy?

A

Often, it is not important who an individual is, only that the person is authorized to perform an action

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Provide an example of how a role can be used to reduce the need to identify an individual?

A

A credit card account may have several authorized users, and the merchant only needs to know that one of the authorized users is making a purchase (a role), not which one (an identity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can an individual increase their level of privacy when sending an email or when accessing services on the web?

A

Use a hash function to prove their identity without revealing it
Use a 3rd party to validate their identity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why would a system need to know a person’s identity?

A
  • Access control
  • Attribution - the ability to prove who performed an action
  • Enhanced user experience and personalization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can you increase privacy but still personalize a website to each user?

A

Use a pseudonym

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do you need to represent identity?

A
  • A combination of information that is unique (name + DoB) - this typically results in an identified individual
  • User-specified identifier (user ID)
  • System-generated user IDs
  • Externally created unique IDs (for example an email)
  • Identity systems (google wallet, PKI)
  • Biometrics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the advantages of using user IDs?

A
  • The system can guarantee uniqueness

* Provides pseudonymity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the disadvantages of using user-specified user IDs?

A
  • Users may want the same user ID
  • Users who forget their user ID may try something generic like their last name and end up locking someone else out of their account after multiple tries
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the advantages of system-generated user IDs over user-specified user IDs?

A

• Provides greater privacy - a user-specified ID may include personal information like their name

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the advantages of using an externally created unique IDs?

A
  • User friendly (user can reuse another identifier)
  • Reduces the number of identifiers a user needs to remember
  • The burden of providing uniqueness is outsourced
  • Information can be linked across systems
  • Easier to detect fraud or identity theft
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the dangers of using some biometrics as an identifier?

A
  • Face recognition is not accurate enough in large groups of people - you might end up with false positives
  • Using it for both identification and authentication could provide someone with inappropriate access
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the purpose of authentication?

A

Used to ensure that an individual performing an action matches the expected identity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the 4 main categories of authentication mechanisms?

A
  • What you know - passwords or personal information
  • What you have - requiring an object
  • Where you are - location
  • What you are - biometrics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What needs to be considered when deciding which authentication mechanism to use?

A

Challenges of creating and revoking the chosen credentials

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the advantage of using passwords?

A

High level of assurance that the correct individual is being identified

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the disadvantage of using passwords?

A

They can be easily broken

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

List the 2 categories of password-based authentication attacks

A
  • Attacks on the password itself

* Attacks performed directly through the system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How can you avoid password guessing attacks and how could it negatively affect users?

A

Apply a limit on failed password attempts - places a burden on legitimate users who incorrectly enter their password

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Provide an example of a password guessing attack method

A

Dictionary attack

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Provide two examples of password-based attacks performed directly through the system?

A
  • Man-in-the-middle attack

* Replay attack

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How can man-in-the-middle attacks be combated?

A

Encrypting the password

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How do replay attacks work?

A
  • Usually combined with man-in-the-middle attack when the password is encrypted
  • The hashed password is replayed to gain access
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How can you combat a replay attack?

A

Issue a unique challenge for each authentication

For example, using a different encryption key for each authentication attempt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How are security questions typically implemented in today’s systems?

A

Normally as a secondary form of authentication - for example when a user needs to reset a forgotten password

30
Q

Devices fall into which category of authentication?

A

What you have

31
Q

Provide examples of devices used in authentication

A
  • Identification badges or smart cards
  • Radio Frequency Identification (RFID)
  • Small devices with changing PINs
  • Using the computer itself (ex. MAC address)
32
Q

What is the risk of using devices for authentication?

A

They can be lost or stolen

33
Q

How do you mitigate the risk of an individual losing their authentication device?

A

Combine authentication with another means such as a password

34
Q

Why is location a good method of authentication when used for corporate offices?

A

Requires an attacker to gain physical access as well as defeat other authentication (such as passwords)

35
Q

How should location be implemented in authentication?

A

Should almost always be viewed as a secondary form of authentication, used to provide stronger evidence that the primary form of authentication is correct

36
Q

Provide 3 examples of biometrics

A
  • Fingerprints
  • Face recognition
  • Voice recognition
37
Q

What more advanced forms of facial recognition exist?

A

Expression and gaze - these prevent the use of static images to spoof the system

38
Q

What can you do to improve the security of voice recognition?

A
  • Make it text-dependent - a user enrolls a specific passphrase (this also provides the ability to change it if it is compromised)
  • Another method is having the system ask the user to read something out - this mitigates the risk of replay attacks if someone recorded the individual
39
Q

Why does the use of biometric data raise inherent privacy concerns?

A
  • A fingerprint or picture are individually identifiable

* Some individuals may also not want to show their face

40
Q

What options exist for providing continuous authentication?

A

Based on behaviour such as typing rate or mouse movement

Provides a degree of confidence that the user hasn’t walked away and someone else has stepped into the account

41
Q

What is multifactor authentication?

A

Systems that require 2 or more different mechanisms to authenticate

42
Q

How can multifactor authentication be transparent to the user?

A

Using cookies to confirm that the user has previously logged in using that device

43
Q

Provide an example of authentication being separate from the system requiring authenticity

A

Single sign-on

44
Q

What is the scope of application for GDPR?

A

Personal data

45
Q

What is the scope of the US Healthcare Insurance Portability and accountability Act (HIPAA)?

A

Protected health information

46
Q

What is browser fingerprinting?

A

In an effort to personalize or customize the user experience, websites can track the user with browser cookies or other techniques

47
Q

Why are IP addresses considered PII?

A

Evidence has shown that even dynamically assigned IP addresses do not change frequently, allowing a client computer to repeatedly be linked to the same IP address for long periods of time
IPv6 uses 64-bit numbers derived from a computer’s hardware address

48
Q

Provide examples of strong identifiers

A

National identification
Passport or credit card number
Names can be, but common names may not be uniquely identifying

49
Q

What are weak identifiers?

A

Identifiers that must be used in combination with other information to determine identity

50
Q

What are quasi-identifiers?

A

Data that can be combined with external knowledge to link data to an individual

51
Q

List 3 approaches to anonymization

A
  • Suppression
  • Generalization
  • Noise addition
52
Q

Describe suppression as an anonymization approach

A

Removing identifying values from a record

Names and identifying numbers are typically handled through suppression

53
Q

Describe generalization as an anonymization approach

A

Replacing a data element with a more general element; for example, by removing the day and month from a birth date

54
Q

Describe noise addition as an anonymization approach

A

Replacing actual data values with other values that are selected from the same class of data

55
Q

What is a microdata set?

A

A microdata set contains the original records, but the data values have been suppressed or generalized, or noise has been added to protect privacy

56
Q

What is data imputation?

A

Replacing suppressed values with plausible data to mimic the actual dataset

57
Q

What is value swapping?

A

Switching values between records in ways that preserve most statistics but no longer give correct information about individuals

58
Q

Provide examples of generalization in anonymization

A
  • Generalizing values to ranges (e.g., birth decade rather than birth year)
  • values are often top- and bottom-coded (e.g., reporting all ages over 80 as “>80” as opposed to reporting decade)
  • Rounding can also be used as a form of generalization (e.g., to the nearest integer, or nearest 10)
  • Removing the last three digits of the postal code or more general if this does not yield a region containing at least 20,000 people
59
Q

What is k-anonymity?

A

Requires that every record in the microdata set must be part of a group of at least k records having identical quasi-identifying information

60
Q

What is l-diversity?

A

Extends k-anonymity by further requiring that there be at least l distinct values in each group of k records

61
Q

What is t-closeness?

A

Ensures that the distribution of values in a group of k is sufficiently close to the overall distribution

62
Q

What is the key issue when releasing aggregated data?

A

Determining whether the data is frequency or magnitude data

63
Q

How do you determine whether aggregated data is frequency or magnitude?

A

Determine whether individuals contribute equally or unequally to the value released

For example, a count of the number of individuals at a given income and age is frequency data: Each individual contributes one to the cell they are in

A table giving average income by age is magnitude data: Someone with a high income will affect the average much more than an individual whose income is close to the average

64
Q

How should you anonymize magnitude data?

A

Noise addition or entire suppression of the cell is typically needed to ensure privacy

65
Q

How should you anonymize frequency data?

A

Rounding techniques may well be sufficient

66
Q

What is database reconstruction?

A

Builds a dataset that would generate the aggregate statistics

In many cases, it can be shown that such a reconstructed database is unique, or at least that many of the individual records in the dataset are unique

67
Q

How can you mitigate the risk of database reconstruction?

A

Techniques such as top- and bottom-coding, rounding and suppression do not guarantee protection against database reconstruction
The only way to provide guaranteed limits on the risk of database reconstruction is noise addition

68
Q

What is a differentially private algorithm?

A

Its behavior hardly changes when a single individual joins or leaves the dataset – anything the algorithm might output on a database containing some individual’s information is almost as likely to have come from a database without that individual’s information

69
Q

What client-side techniques can be used to increase anonymity?

A
  • Proxy servers
  • Onion routing and Crowds
  • Tools that generate “cover queries” - fake query traffic that disguise the actual request
  • Noise addition
70
Q

How does a proxy server provide anonymity?

A

They hide the IP address of a request by replacing it with that of the proxy server

71
Q

What is TOR?

A

Tor is a peer-to-peer network where each request is routed to another peer, which routes it to another peer, and so on until a final peer makes the actual request

Encryption is used to ensure that only the first peer knows where the request came from, and only the last peer knows the server to which the request is being routed